gan-yang-zuzhu
commited on
Commit
·
7186f2d
1
Parent(s):
8810cfa
update README.md
Browse files
README.md
CHANGED
@@ -10,8 +10,8 @@ tags:
|
|
10 |
- preference model
|
11 |
---
|
12 |
|
13 |
-
####
|
14 |
-
|
15 |
These approaches effectively transfer preferences from auxiliary textual data to enhance the model's robustness.
|
16 |
The repository hosts the RoVRM built on the LLaVA-1.5-7B model.
|
17 |
We employed RoVRM for best-of-$n$ sampling and RL training, demonstrating its capability to significantly improve performance and reduce hallucination in large vision-language models.
|
|
|
10 |
- preference model
|
11 |
---
|
12 |
|
13 |
+
#### Robust Visual Reward Model
|
14 |
+
Robust visual reward model (RoVRM) is developed through a three-phase progressive training (i.e., pre-training with textual preference data→fine-tuning with image caption-based preference data→fine-tuning with visual preference data), and optimal transport-based selective preference data.
|
15 |
These approaches effectively transfer preferences from auxiliary textual data to enhance the model's robustness.
|
16 |
The repository hosts the RoVRM built on the LLaVA-1.5-7B model.
|
17 |
We employed RoVRM for best-of-$n$ sampling and RL training, demonstrating its capability to significantly improve performance and reduce hallucination in large vision-language models.
|