Huang Liang Hsun's picture

Huang Liang Hsun PRO

lianghsun

AI & ML interests

Founder of ๐—ง๐˜„๐—ถ๐—ป๐—ธ๐—น๐—ฒ ๐—”๐—œ. Focused on applying deep learning in legal and scientific domains, with expertise in NLP and model fine-tuning.

Recent Activity

liked a dataset about 9 hours ago
trendmicro-ailab/Primus-FineWeb
updated a model about 10 hours ago
lianghsun/Llama-3.3-70B-Taiwan-Cyber-Instruct
published a model about 10 hours ago
lianghsun/Llama-3.3-70B-Taiwan-Cyber-Instruct
View all activity

Organizations

shareAI's profile picture Open-Source AI Meetup's profile picture Hugging Face for Legal's profile picture Model Collapse's profile picture Taiwan Llama's profile picture Twinkle AI's profile picture

lianghsun's activity

replied to their post 24 days ago
view reply

lol thanks! Iโ€™ve always wondered why HF posts donโ€™t support markdown.

posted an update 24 days ago
view post
Post
2281

With the arrival of Twinkle April โ€” Twinkle AIโ€™s annual open-source celebration held every April โ€” our community is excited to unveil its very first project:

๐Ÿ“Š Twinkle Eval (https://github.com/ai-twinkle/Eval), a next-generation evaluation tool led by our contributor @tedslin .

Unlike traditional evaluation tools like iKalaโ€™s ievals (https://github.com/ikala-ai/ievals), which can only evaluate language models (LMs) one sample at a time, Twinkle Eval is designed with Large Reasoning Models (LRMs) in mind. As reasoning time increases with more complex models, traditional tools become increasingly inefficient ๐Ÿ˜ฒ โ€” for example, evaluating LRMs on the ikala/tmmluplus benchmark could take *
half a day without finishing.

One question we were especially curious about:
Does shuffling multiple-choice answer order impact model accuracy? ๐Ÿค”
โ†’ See: "Change Answer Order Can Decrease MMLU Accuracy" โ€“ arXiv:2406.19470v1

To address these challenges, Twinkle Eval brings three key innovations to the table:

1๏ธโƒฃ Parallelized evaluation of samples
2๏ธโƒฃ Multi-round testing for stability
3๏ธโƒฃ Randomized answer order to test robustness

After running experiments, we observed that Twinkle Eval can speed up evaluation by up to 15ร— ๐Ÿš€๐Ÿš€. Interestingly, most models scored slightly lower under the 2๏ธโƒฃ3๏ธโƒฃ test settings compared to their claimed performance โ€” suggesting further benchmarking is needed.

This framework also comes with additional tunable parameters and detailed logging of LM behavior per question โ€” perfect for those who want to dive deeper. ๐Ÿ˜†

If you find Twinkle Eval useful, please โญ the project and help spread the word ๐Ÿค—
ยท
posted an update 3 months ago
view post
Post
2592
๐Ÿ–– Let me introduce the work I've done over the past three months: ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿฎ-๐—ง๐—ฎ๐—ถ๐˜„๐—ฎ๐—ป-๐Ÿฏ๐—• and ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿฎ-๐—ง๐—ฎ๐—ถ๐˜„๐—ฎ๐—ป-๐Ÿฏ๐—•-๐—œ๐—ป๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜, now open-sourced on ๐Ÿค— Hugging Face.

๐—น๐—ถ๐—ฎ๐—ป๐—ด๐—ต๐˜€๐˜‚๐—ป/๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿฎ-๐—ง๐—ฎ๐—ถ๐˜„๐—ฎ๐—ป-๐Ÿฏ๐—•: This model is built on top of ๐—บ๐—ฒ๐˜๐—ฎ-๐—น๐—น๐—ฎ๐—บ๐—ฎ/๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿฎ-๐Ÿฏ๐—• with continual pretraining. The training dataset consists of a mixture of Traditional Chinese and multilingual texts in specific proportions, including 20B tokens of Traditional Chinese text.

๐—น๐—ถ๐—ฎ๐—ป๐—ด๐—ต๐˜€๐˜‚๐—ป/๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿฎ-๐—ง๐—ฎ๐—ถ๐˜„๐—ฎ๐—ป-๐Ÿฏ๐—•-๐—œ๐—ป๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜: This is a fine-tuned conversational model based on the foundation model.

This Llama-3.2-Taiwan open-source project is currently a one-person effort (yes, I did everything from text preparation โ€” so exhausting!). If you're interested, feel free to join the Discord server for discussions.

๐Ÿ…ฑ๐Ÿ…ด๐Ÿ…ฝ๐Ÿ…ฒ๐Ÿ…ท๐Ÿ…ผ๐Ÿ…ฐ๐Ÿ†๐Ÿ…บ๐Ÿ…ธ๐Ÿ…ฝ๐Ÿ…ถ

The evaluation was conducted using ikala/tmmluplus, though the README page does not yet reflect the latest results. The performance is close to the previous versions, indicating that further improvements might require adding more specialized knowledge in the datasets.

๐Ÿ…ฐ ๐Ÿ…ฒ๐Ÿ…ฐ๐Ÿ…ป๐Ÿ…ป ๐Ÿ…ต๐Ÿ…พ๐Ÿ† ๐Ÿ†‚๐Ÿ†„๐Ÿ…ฟ๐Ÿ…ฟ๐Ÿ…พ๐Ÿ†๐Ÿ†ƒ

If anyone is willing to provide compute resources, it would be greatly appreciated to help this project continue and grow. ๐Ÿ’ช

---
๐Ÿ”๏ธ Foundation model: lianghsun/Llama-3.2-Taiwan-3B
๐Ÿค– Instruction model: lianghsun/Llama-3.2-Taiwan-3B-Instruct
โšก GGUF: lianghsun/Llama-3.2-Taiwan-3B-Instruct-GGUF
  • 4 replies
ยท