Update README.md
Browse files
README.md
CHANGED
@@ -1,46 +1,83 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
-
|
|
|
4 |
|
5 |
-
|
6 |
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
- Operates on action chunks for better long-horizon planning
|
12 |
|
13 |
-
|
14 |
|
15 |
-
|
16 |
-
from huggingface_hub import snapshot_download
|
17 |
-
import torch
|
18 |
-
import hydra
|
19 |
-
from omegaconf import OmegaConf
|
20 |
-
import json
|
21 |
-
import os
|
22 |
|
23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
|
25 |
-
with open(os.path.join(model_path, "config.json")) as f:
|
26 |
-
config = json.load(f)
|
27 |
|
28 |
-
|
29 |
-
model_cfg["_target_"] = "flower.models.flower.FLOWERVLA"
|
30 |
|
31 |
-
|
|
|
|
|
|
|
32 |
|
33 |
-
|
34 |
-
|
35 |
|
36 |
-
|
37 |
|
38 |
-
|
39 |
-
# goal = {"lang_text": "push the blue block to the right"}
|
40 |
-
# action = model.step(obs, goal)
|
41 |
|
42 |
-
|
43 |
-
|
44 |
-
|
|
|
|
|
45 |
}
|
46 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
base_model:
|
6 |
+
- microsoft/Florence-2-large
|
7 |
+
pipeline_tag: robotics
|
8 |
+
tags:
|
9 |
+
- VLA
|
10 |
+
- LIBERO
|
11 |
+
- Robotics
|
12 |
+
- Flow
|
13 |
+
---
|
14 |
+
# FlowerVLA - Vision-Language-Action Flow Model finetuned on LIBERO Spatial
|
15 |
|
16 |
+
This is a pretrained FlowerVLA model for robotic manipulation trained on the LIBERO Spatial dataset.
|
17 |
+
Flower is an efficient Vision-Language-Action Flow policy for robot learning that only contains 1B parameters.
|
18 |
|
19 |
+
## Model Description
|
20 |
|
21 |
+
FlowerVLA is a novel architecture that:
|
22 |
+
- Uses half of Florence-2 for multi-modal vision-language encoding
|
23 |
+
- Employs an novel transformer-based flow matching architecture
|
24 |
+
- Provides an efficient, versatile VLA policy with only ~1B parameters
|
|
|
25 |
|
26 |
+
## Model Performance
|
27 |
|
28 |
+
This checkpoint contains weights for the LIBERO Spatial challenge and achieves these results:
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
+
avg_seq_len success rate 0.9681089520454407
|
31 |
+
pick_up_the_black_bowl_between_the_plate_and_the_ramekin_and_place_it_on_the_plate with success 0.9791666666666666
|
32 |
+
pick_up_the_black_bowl_next_to_the_ramekin_and_place_it_on_the_plate with success 0.9807692307692308
|
33 |
+
pick_up_the_black_bowl_from_table_center_and_place_it_on_the_plate with success 0.9807692307692308
|
34 |
+
pick_up_the_black_bowl_on_the_cookie_box_and_place_it_on_the_plate with success 1.0
|
35 |
+
pick_up_the_black_bowl_in_the_top_drawer_of_the_wooden_cabinet_and_place_it_on_the_plate with success 1.0
|
36 |
+
pick_up_the_black_bowl_on_the_ramekin_and_place_it_on_the_plate with success 0.8621794871794872
|
37 |
+
pick_up_the_black_bowl_next_to_the_cookie_box_and_place_it_on_the_plate with success 1.0
|
38 |
+
pick_up_the_black_bowl_on_the_stove_and_place_it_on_the_plate with success 1.0
|
39 |
+
pick_up_the_black_bowl_next_to_the_plate_and_place_it_on_the_plate with success 0.9166666666666666
|
40 |
+
pick_up_the_black_bowl_on_the_wooden_cabinet_and_place_it_on_the_plate with success 0.9615384615384616
|
41 |
|
|
|
|
|
42 |
|
43 |
+
### Input/Output Specifications
|
|
|
44 |
|
45 |
+
#### Inputs
|
46 |
+
- RGB Static Camera: `(B, T, 3, H, W)` tensor
|
47 |
+
- RGB Gripper Camera: `(B, T, 3, H, W)` tensor
|
48 |
+
- Language Instructions: Text strings
|
49 |
|
50 |
+
#### Outputs
|
51 |
+
- Action Space: `(B, T, 7)` tensor representing delta EEF actions
|
52 |
|
53 |
+
## Usage
|
54 |
|
55 |
+
Check out our full model implementation on Github [todo]() and follow the instructions in the readme to test the model on one of the environments.
|
|
|
|
|
56 |
|
57 |
+
```python
|
58 |
+
obs = {
|
59 |
+
"rgb_obs": {
|
60 |
+
"rgb_static": static_image,
|
61 |
+
"rgb_gripper": gripper_image
|
62 |
}
|
63 |
+
}
|
64 |
+
goal = {"lang_text": "pick up the blue cube"}
|
65 |
+
action = model.step(obs, goal)
|
66 |
+
```
|
67 |
+
|
68 |
+
## Training Details
|
69 |
+
|
70 |
+
### Configuration
|
71 |
+
- **Optimizer**: AdamW
|
72 |
+
- **Learning Rate**: 2e-5
|
73 |
+
- **Weight Decay**: 0.05
|
74 |
+
|
75 |
+
|
76 |
+
@inproceedings{
|
77 |
+
reuss2025flower,
|
78 |
+
# Add citation when available
|
79 |
+
}
|
80 |
+
|
81 |
+
|
82 |
+
## License
|
83 |
+
This model is released under the MIT license.
|