Patel's picture

Patel

Kimish
·

AI & ML interests

None yet

Recent Activity

updated a model about 18 hours ago
Kimish/Qwen3-4B-untied-8da4w
published a model about 18 hours ago
Kimish/Qwen3-4B-untied-8da4w
updated a model about 18 hours ago
Kimish/Qwen3-4B-untied-weights
View all activity

Organizations

pytorch's profile picture Hugging Face Discord Community's profile picture Meta Llama's profile picture AI at Meta's profile picture ExecuTorch Community's profile picture

Kimish's activity

reacted to wassemgtk's post with 😎 22 days ago
view post
Post
2854
I’ve been diving into the iRoPE architecture from Llama 4—a game-changer for long-context models! It interleaves local attention (with RoPE) for short contexts and global attention (with inference-time temp scaling) for long-range reasoning, aiming for infinite context. I’m going to try writing iRoPE—who wants to help?

Code: https://github.com/wassemgtk/iRoPE-try/blob/main/iRoPE.ipynb
  • 1 reply
·