Thinh Le PRO

thinhlpg

AI & ML interests

anime stuffs

Recent Activity

liked a Space about 4 hours ago
k-mktr/gpu-poor-llm-arena
upvoted a collection about 4 hours ago
Leaderboards and benchmarks ✨
liked a dataset about 4 hours ago
basicv8vc/SimpleQA
View all activity

Organizations

The Waifu Research Department's profile picture AIvent Technology's profile picture Jan's profile picture NΓ³n LΓ‘'s profile picture Menlo Research's profile picture Try Wibu Stuffs's profile picture

thinhlpg's activity

reacted to merterbak's post with πŸš€πŸ”₯ about 9 hours ago
view post
Post
3047
OpenAI has released BrowseComp an open source benchmark designed to evaluate the web browsing capabilities of AI agents. This dataset comprising 1,266 questions challenges AI models to navigate the web and uncover complex and obscure information. Crafted by human trainers, the questions are intentionally difficult. (unsolvable by another person in under ten minutes and beyond the reach of existing models like ChatGPT with and without browsing and an early version of OpenAI's Deep Research tool.)

Blog Post: https://openai.com/index/browsecomp/
Paper: https://cdn.openai.com/pdf/5e10f4ab-d6f7-442e-9508-59515c65e35d/browsecomp.pdf
Code in simple eval repo: https://github.com/openai/simple-evals
reacted to burtenshaw's post with πŸ§ πŸ‘ 3 days ago
view post
Post
1557
Qwen 3 Fine tuning >> MoE. Update the experiment thread to include config and script for fine-tuning the Qwen3-30B-A3B model.

The goal is to make a low latency non-thinking model for a daily driver coding, so 3 billion parameters active should be perfect.

βœ”οΈ training running
βœ”οΈ evals running
⏭️ improve dataset

The moe isn't going to fit into colab's A100 even with quantization (πŸ™ @UnslothAI ). So I've been working on HF spaces' H100s for this. Everything is available in the tread and I'll share more tomorrow.

burtenshaw/Qwen3-Code-Lite#1