Malaysian Qwen 2.5 72B Instruct
Continue finetuning https://huggingface.co/Qwen/Qwen2.5-72B-Instruct on highly curated 1.5B tokens Malaysian instruction dataset.
Improvement
- Support respond in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
- Able to code in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
- Multi-turn Malaysian context such as related to Malaysian Legislation, politics, religions and languages.
Training session
Finetune on mesolitica/Malaysian-SFT to make the model understand Malaysian context.
How we train
- LoRA on
["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"]
.
- 128 Rank with alpha 256, or alpha of 2.0
- Multipacking 8192 context length with proper SDPA causal masking to prevent document contamination and also make sure proper position ids.
- Chunk CCE loss for LoRA.
- WanDB at https://wandb.ai/huseinzol05/lora-embedding-128-qwen2.5-72b-malaysian-8k?nw=nwuserhuseinzol05
Source code at https://github.com/mesolitica/malaya/tree/master/session/qwen2.5
Acknowledgement
Special thanks to https://www.sns.com.my for 8x H100 node!