RefalMachine commited on
Commit
25033eb
Β·
verified Β·
1 Parent(s): c79fbcd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -16,9 +16,8 @@ In addition to developing the methodology itself, we also employ it to adapt exi
16
  One of the unique features of our approach to adaptation lies in the fact that, thanks to the LEP method - Learned Embedding Propagation (see paper), we adapt the base version of the model just once and can then very affordably adapt any instructive version derived from this base. For instance, after adapting Qwen2.5-32B, we managed to obtain RaadaptQwen2.5 versions not only for Qwen2.5-32B-Instruct but also for QwQ-32B-Preview, QwQ-32B, FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview (while preserving reasoning capabilities), and T-pro-it-1.0.
17
 
18
  An intriguing aspect of adapting T-pro-it-1.0 is that this model was obtained through continuous pretraining on over 100 billion tokens of Russian-language data using full fine-tuning. Despite this extensive prior training, our methodology still worked effectively (note: the original base model Qwen2.5-32B was adapted!), and the resulting adapted version either outperformed or matched T-pro-it-1.0 on several benchmarks. Moreover, it demonstrated higher efficiency in Russian-language tokenization.
19
- <div style="text-align: center">
20
- <img src="https://cdn-uploads.huggingface.co/production/uploads/652cedbdf120598322ae358a/sKwHvA9ztd7rHx37Ca2ey.png" style="max-width: 50%; height: auto;">
21
- </div>
22
 
23
  ## Papers
24
  Tikhomirov M., Chernyshov D. Facilitating Large Language Model Russian Adaptation with Learned Embedding Propagation //Journal of Language and Education. – 2024. – Π’. 10. – β„–. 4. – Π‘. 130-145. (Preprint: https://arxiv.org/abs/2412.21140)
 
16
  One of the unique features of our approach to adaptation lies in the fact that, thanks to the LEP method - Learned Embedding Propagation (see paper), we adapt the base version of the model just once and can then very affordably adapt any instructive version derived from this base. For instance, after adapting Qwen2.5-32B, we managed to obtain RaadaptQwen2.5 versions not only for Qwen2.5-32B-Instruct but also for QwQ-32B-Preview, QwQ-32B, FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview (while preserving reasoning capabilities), and T-pro-it-1.0.
17
 
18
  An intriguing aspect of adapting T-pro-it-1.0 is that this model was obtained through continuous pretraining on over 100 billion tokens of Russian-language data using full fine-tuning. Despite this extensive prior training, our methodology still worked effectively (note: the original base model Qwen2.5-32B was adapted!), and the resulting adapted version either outperformed or matched T-pro-it-1.0 on several benchmarks. Moreover, it demonstrated higher efficiency in Russian-language tokenization.
19
+
20
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/652cedbdf120598322ae358a/sKwHvA9ztd7rHx37Ca2ey.png" style="display: block; margin: 0 auto; max-width: 50%; height: auto;">
 
21
 
22
  ## Papers
23
  Tikhomirov M., Chernyshov D. Facilitating Large Language Model Russian Adaptation with Learned Embedding Propagation //Journal of Language and Education. – 2024. – Π’. 10. – β„–. 4. – Π‘. 130-145. (Preprint: https://arxiv.org/abs/2412.21140)