mesolitica
/

Malaysian-Qwen2.5-72B-Instruct

Model card Files Files and versions

Malaysian Qwen 2.5 72B Instruct

Continue finetuning https://huggingface.co/Qwen/Qwen2.5-72B-Instruct on highly curated 1.5B tokens Malaysian instruction dataset.

Improvement

Support respond in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
Able to code in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
Multi-turn Malaysian context such as related to Malaysian Legislation, politics, religions and languages.

Training session

Finetune on mesolitica/Malaysian-SFT to make the model understand Malaysian context.

How we train

LoRA on ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"].
128 Rank with alpha 256, or alpha of 2.0
Multipacking 8192 context length with proper SDPA causal masking to prevent document contamination and also make sure proper position ids.
Chunk CCE loss for LoRA.
WanDB at https://wandb.ai/huseinzol05/lora-embedding-128-qwen2.5-72b-malaysian-8k?nw=nwuserhuseinzol05

Source code at https://github.com/mesolitica/malaya/tree/master/session/qwen2.5

Acknowledgement

Special thanks to https://www.sns.com.my for 8x H100 node!

Downloads last month: 6

Safetensors

Model size

72.7B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mesolitica/Malaysian-Qwen2.5-72B-Instruct

Quantizations

Collection including mesolitica/Malaysian-Qwen2.5-72B-Instruct

Malaysian Finetuned Instruct

Continue finetuning using LoRA from 0.5B up to 72B. • 16 items • Updated 3 days ago