mbreuss commited on
Commit
5e07210
·
verified ·
1 Parent(s): a6abe3b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -32
README.md CHANGED
@@ -1,46 +1,83 @@
1
- # FlowerVLA - Vision-Language-Action Flow Model for {dataset_name}
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- This is a pretrained FlowerVLA model for robotic manipulation trained on the {dataset_name} dataset. FlowerVLA is an efficient Vision-Language-Action Flow policy for robot learning.
 
4
 
5
- ## Model Description
6
 
7
- FlowerVLA is a novel architecture that:
8
- - Uses Florence-2 for multi-modal vision-language encoding
9
- - Employs a transformer-based flow matching architecture
10
- - Provides an efficient policy with ~1B parameters
11
- - Operates on action chunks for better long-horizon planning
12
 
13
- ## Usage
14
 
15
- ```python
16
- from huggingface_hub import snapshot_download
17
- import torch
18
- import hydra
19
- from omegaconf import OmegaConf
20
- import json
21
- import os
22
 
23
- model_path = snapshot_download(repo_id="{repo_id}")
 
 
 
 
 
 
 
 
 
 
24
 
25
- with open(os.path.join(model_path, "config.json")) as f:
26
- config = json.load(f)
27
 
28
- model_cfg = OmegaConf.create(config["model_config"])
29
- model_cfg["_target_"] = "flower.models.flower.FLOWERVLA"
30
 
31
- model = hydra.utils.instantiate(model_cfg)
 
 
 
32
 
33
- state_dict = torch.load(os.path.join(model_path, "model.pt"))
34
- model.load_state_dict(state_dict)
35
 
36
- model.eval()
37
 
38
- # obs = {...} # Your observation dict
39
- # goal = {"lang_text": "push the blue block to the right"}
40
- # action = model.step(obs, goal)
41
 
42
- @inproceedings{
43
- reuss2024multimodal,
44
- # Add citation when available
 
 
45
  }
46
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - microsoft/Florence-2-large
7
+ pipeline_tag: robotics
8
+ tags:
9
+ - VLA
10
+ - LIBERO
11
+ - Robotics
12
+ - Flow
13
+ ---
14
+ # FlowerVLA - Vision-Language-Action Flow Model finetuned on LIBERO Spatial
15
 
16
+ This is a pretrained FlowerVLA model for robotic manipulation trained on the LIBERO Spatial dataset.
17
+ Flower is an efficient Vision-Language-Action Flow policy for robot learning that only contains 1B parameters.
18
 
19
+ ## Model Description
20
 
21
+ FlowerVLA is a novel architecture that:
22
+ - Uses half of Florence-2 for multi-modal vision-language encoding
23
+ - Employs an novel transformer-based flow matching architecture
24
+ - Provides an efficient, versatile VLA policy with only ~1B parameters
 
25
 
26
+ ## Model Performance
27
 
28
+ This checkpoint contains weights for the LIBERO Spatial challenge and achieves these results:
 
 
 
 
 
 
29
 
30
+ avg_seq_len success rate 0.9681089520454407
31
+ pick_up_the_black_bowl_between_the_plate_and_the_ramekin_and_place_it_on_the_plate with success 0.9791666666666666
32
+ pick_up_the_black_bowl_next_to_the_ramekin_and_place_it_on_the_plate with success 0.9807692307692308
33
+ pick_up_the_black_bowl_from_table_center_and_place_it_on_the_plate with success 0.9807692307692308
34
+ pick_up_the_black_bowl_on_the_cookie_box_and_place_it_on_the_plate with success 1.0
35
+ pick_up_the_black_bowl_in_the_top_drawer_of_the_wooden_cabinet_and_place_it_on_the_plate with success 1.0
36
+ pick_up_the_black_bowl_on_the_ramekin_and_place_it_on_the_plate with success 0.8621794871794872
37
+ pick_up_the_black_bowl_next_to_the_cookie_box_and_place_it_on_the_plate with success 1.0
38
+ pick_up_the_black_bowl_on_the_stove_and_place_it_on_the_plate with success 1.0
39
+ pick_up_the_black_bowl_next_to_the_plate_and_place_it_on_the_plate with success 0.9166666666666666
40
+ pick_up_the_black_bowl_on_the_wooden_cabinet_and_place_it_on_the_plate with success 0.9615384615384616
41
 
 
 
42
 
43
+ ### Input/Output Specifications
 
44
 
45
+ #### Inputs
46
+ - RGB Static Camera: `(B, T, 3, H, W)` tensor
47
+ - RGB Gripper Camera: `(B, T, 3, H, W)` tensor
48
+ - Language Instructions: Text strings
49
 
50
+ #### Outputs
51
+ - Action Space: `(B, T, 7)` tensor representing delta EEF actions
52
 
53
+ ## Usage
54
 
55
+ Check out our full model implementation on Github [todo]() and follow the instructions in the readme to test the model on one of the environments.
 
 
56
 
57
+ ```python
58
+ obs = {
59
+ "rgb_obs": {
60
+ "rgb_static": static_image,
61
+ "rgb_gripper": gripper_image
62
  }
63
+ }
64
+ goal = {"lang_text": "pick up the blue cube"}
65
+ action = model.step(obs, goal)
66
+ ```
67
+
68
+ ## Training Details
69
+
70
+ ### Configuration
71
+ - **Optimizer**: AdamW
72
+ - **Learning Rate**: 2e-5
73
+ - **Weight Decay**: 0.05
74
+
75
+
76
+ @inproceedings{
77
+ reuss2025flower,
78
+ # Add citation when available
79
+ }
80
+
81
+
82
+ ## License
83
+ This model is released under the MIT license.