--- license: mit language: - en base_model: - microsoft/Florence-2-large pipeline_tag: robotics tags: - VLA - LIBERO - Robotics - Flow --- # FlowerVLA - Vision-Language-Action Flow Model finetuned on LIBERO Spatial This is a pretrained FlowerVLA model for robotic manipulation trained on the LIBERO Spatial dataset. Flower is an efficient Vision-Language-Action Flow policy for robot learning that only contains 1B parameters. ## Model Description FlowerVLA is a novel architecture that: - Uses half of Florence-2 for multi-modal vision-language encoding - Employs an novel transformer-based flow matching architecture - Provides an efficient, versatile VLA policy with only ~1B parameters ## Model Performance This checkpoint contains weights for the LIBERO Spatial challenge and achieves these results: avg_seq_len success rate 0.9681089520454407 pick_up_the_black_bowl_between_the_plate_and_the_ramekin_and_place_it_on_the_plate with success 0.9791666666666666 pick_up_the_black_bowl_next_to_the_ramekin_and_place_it_on_the_plate with success 0.9807692307692308 pick_up_the_black_bowl_from_table_center_and_place_it_on_the_plate with success 0.9807692307692308 pick_up_the_black_bowl_on_the_cookie_box_and_place_it_on_the_plate with success 1.0 pick_up_the_black_bowl_in_the_top_drawer_of_the_wooden_cabinet_and_place_it_on_the_plate with success 1.0 pick_up_the_black_bowl_on_the_ramekin_and_place_it_on_the_plate with success 0.8621794871794872 pick_up_the_black_bowl_next_to_the_cookie_box_and_place_it_on_the_plate with success 1.0 pick_up_the_black_bowl_on_the_stove_and_place_it_on_the_plate with success 1.0 pick_up_the_black_bowl_next_to_the_plate_and_place_it_on_the_plate with success 0.9166666666666666 pick_up_the_black_bowl_on_the_wooden_cabinet_and_place_it_on_the_plate with success 0.9615384615384616 ### Input/Output Specifications #### Inputs - RGB Static Camera: `(B, T, 3, H, W)` tensor - RGB Gripper Camera: `(B, T, 3, H, W)` tensor - Language Instructions: Text strings #### Outputs - Action Space: `(B, T, 7)` tensor representing delta EEF actions ## Usage Check out our full model implementation on Github [todo]() and follow the instructions in the readme to test the model on one of the environments. ```python obs = { "rgb_obs": { "rgb_static": static_image, "rgb_gripper": gripper_image } } goal = {"lang_text": "pick up the blue cube"} action = model.step(obs, goal) ``` ## Training Details ### Configuration - **Optimizer**: AdamW - **Learning Rate**: 2e-5 - **Weight Decay**: 0.05 @inproceedings{ reuss2025flower, # Add citation when available } ## License This model is released under the MIT license.