21 4 315

Sthenno

sthenno

https://github.com/neoheartbeats

neoheartbeats

AI & ML interests

To contact me: [email protected]

Recent Activity

reacted to sometimesanotion's post with 👍 about 5 hours ago

The capabilities of the new Qwen 3 models are fascinating, and I am watching that space! My experience, however, is that context management is vastly more important with them. If you use a client with a typical session log with rolling compression, a Qwen 3 model will start to generate the same messages over and over. I don't think that detracts from them. They're optimized for a more advanced MCP environment. I honestly think the 8B is optimal for home use, given proper RAG/CAG. In typical session chats, Lamarck and Chocolatine are still my daily drives. I worked hard to give Lamarck v0.7 a sprinkling of CoT from both DRT and Deepseek R1. While those models got surpassed on the leaderboards, in practice, I still really enjoy their output. My projects are focusing on application and context management, because that's where the payoff in improved quality is right now. But should there be a mix of finetunes to make just the right mix of - my recipes are standing by.

new activity about 6 hours ago

sthenno-com/miscii-14b-0218:使用时需要购买api吗

liked a model 5 days ago

shuttleai/shuttle-3.5

View all activity

Organizations

sthenno's activity

reacted to sometimesanotion's post with 👍 about 5 hours ago

Post

1687

The capabilities of the new Qwen 3 models are fascinating, and I am watching that space!

My experience, however, is that context management is vastly more important with them. If you use a client with a typical session log with rolling compression, a Qwen 3 model will start to generate the same messages over and over. I don't think that detracts from them. They're optimized for a more advanced MCP environment. I honestly think the 8B is optimal for home use, given proper RAG/CAG.

In typical session chats, Lamarck and Chocolatine are still my daily drives. I worked hard to give Lamarck v0.7 a sprinkling of CoT from both DRT and Deepseek R1. While those models got surpassed on the leaderboards, in practice, I still really enjoy their output.

My projects are focusing on application and context management, because that's where the payoff in improved quality is right now. But should there be a mix of finetunes to make just the right mix of - my recipes are standing by.

New activity in sthenno-com/miscii-14b-0218 about 6 hours ago

使用时需要购买api吗

#3 opened 3 days ago by

Andy2390

liked a model 5 days ago

shuttleai/shuttle-3.5

Text Generation • Updated 8 days ago • 150 • 42

reacted to sometimesanotion's post with 👀❤️ 6 days ago

Post

1687

The capabilities of the new Qwen 3 models are fascinating, and I am watching that space!

My experience, however, is that context management is vastly more important with them. If you use a client with a typical session log with rolling compression, a Qwen 3 model will start to generate the same messages over and over. I don't think that detracts from them. They're optimized for a more advanced MCP environment. I honestly think the 8B is optimal for home use, given proper RAG/CAG.

In typical session chats, Lamarck and Chocolatine are still my daily drives. I worked hard to give Lamarck v0.7 a sprinkling of CoT from both DRT and Deepseek R1. While those models got surpassed on the leaderboards, in practice, I still really enjoy their output.

My projects are focusing on application and context management, because that's where the payoff in improved quality is right now. But should there be a mix of finetunes to make just the right mix of - my recipes are standing by.