
shisa-ai/shisa-v2-qwen2.5-7b
Text Generation
•
Updated
•
81
•
3
Note 2025-03 This is a 3B model, but they claim it is comparable to a 7B class perf, so let's see: https://www.sbintuitions.co.jp/blog/entry/2025/03/07/093143
Note 2025-02 ; 4096 context window, test with `VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve llm-jp/llm-jp-3-7.2b-instruct3 --max-model-len 8192 --rope-scaling '{"rope_type":"dynamic","factor":2.0}`
Note Still has Chinese cross-lingual-token leakage
Note No system prompt support breaks many evals
Note shisa-v1 SFT only
Note Requires chat template or BOS token for proper responses