Model Card for Zero-Mistral-24B

Zero-Mistral-24B is an improved TEXT-only version of mistralai/Mistral-Small-3.1-24B-Instruct-2503, primarily adapted for Russian and English languages. The original Mistral model contains vision features which were removed from this model. The training involved SFT stage primarily on Big Russian Dataset dataset and proprietary dataset from Shkolkovo.online.

The model has good math skills and some reasoning abilities.

The modele saves original mistral long context capabilities up to 128k tokens.

Model Details

image/png

Model Description

πŸ“š Model versions

  • Merged 16-bit - original 16bit merged version for transformers.
  • GGUF - different GGUF versions: BF16, F16, Q8_0, Q6_K, Q4_K_M, IQ4_XS, etc.

πŸ“Š Benchmarks for main 16-bit merged version

MERA

MERA score: 0.623

Task Result Metric
LCS 0.194 Accuracy
RCB 0.607 / 0.592 Avg. F1 / Accuracy
USE 0.452 Grade Norm
RWSD 0.55 Accuracy
PARus 0.942 Accuracy
ruTiE 0.868 Accuracy
MultiQ 0.781 / 0.629 F1-score/EM
CheGeKa 0.397 / 0.322 F1 / EM
ruModAr 0.971 EM
MaMuRAMu 0.832 Accuracy
ruMultiAr 0.354 EM
ruCodeEval 0 / 0 / 0 pass@k Β―\_(ツ)_/Β―
MathLogicQA 0.613 Accuracy
ruWorldTree 0.987 / 0.987 Avg. F1 / Accuracy
ruOpenBookQA 0.913 / 0.913 Avg. F1 / Accuracy

ΠžΡ†Π΅Π½ΠΊΠ° ΠΏΠΎ ΠΎΡ‚ΠΊΡ€Ρ‹Ρ‚Ρ‹ΠΌ Π·Π°Π΄Π°Ρ‡Π°ΠΌ:

Π—Π°Π΄Π°Ρ‡Π° Π Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ ΠœΠ΅Ρ‚Ρ€ΠΈΠΊΠ°
BPS 0.981 Accuracy
ruMMLU 0.778 Accuracy
SimpleAr 0.997 EM
ruHumanEval 0.006 / 0.006 / 0.006 pass@k Β―\_(ツ)_/Β―
ruHHH 0.916 Accuracy
ruHateSpeech 0.834 Accuracy
ruDetox 0.341 / 0.843 / 0.624 / 0.66 ΠžΠ±Ρ‰Π°Ρ срСдняя ΠΎΡ†Π΅Π½ΠΊΠ° (J) / ΠžΡ†Π΅Π½ΠΊΠ° сохранСния смысла (SIM) / ΠžΡ†Π΅Π½ΠΊΠ° Π½Π°Ρ‚ΡƒΡ€Π°Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ (FL) / Π’ΠΎΡ‡Π½ΠΎΡΡ‚ΡŒ пСрСноса стиля (STA)
ruEthics [[0.386, 0.399, 0.41, 0.333, 0.327], [0.421, 0.427, 0.452, 0.375, 0.363], [0.653, 0.65, 0.697, 0.596, 0.573]] 5 MCC

Usage

The model can be used with the following frameworks;

Recommended system prompts

prompts = {
    "generic": "Π’Ρ‹ Π²ΠΈΡ€Ρ‚ΡƒΠ°Π»ΡŒΠ½Ρ‹ΠΉ ассистСнт. Π’Ρ‹ ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅ΡˆΡŒ Π½Π° вопросы людСй, помогаСшь ΠΈΠΌ ΠΈ ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΈΠ²Π°Π΅ΡˆΡŒ. Π’Ρ‹ создан, Ρ‡Ρ‚ΠΎΠ±Ρ‹ Π±Ρ‹Ρ‚ΡŒ ΠΏΠΎΠ»Π΅Π·Π½Ρ‹ΠΌ, Π±Π΅Π·ΠΎΠ±ΠΈΠ΄Π½Ρ‹ΠΌ ΠΈ чСстным. Π’Ρ‹ ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅ΡˆΡŒ Π½Π° Ρ‚ΠΎΠΌ языкС, Π½Π° ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΌ Π±Ρ‹Π» Π·Π°Π΄Π°Π½ вопрос ΠΈΠ»ΠΈ попросил ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒ.",
    "think": """Π’Ρ‹ Π²ΠΈΡ€Ρ‚ΡƒΠ°Π»ΡŒΠ½Ρ‹ΠΉ ассистСнт. Π’Ρ‹ ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅ΡˆΡŒ Π½Π° вопросы людСй, помогаСшь ΠΈΠΌ ΠΈ ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΈΠ²Π°Π΅ΡˆΡŒ. Π’Ρ‹ создан, Ρ‡Ρ‚ΠΎΠ±Ρ‹ Π±Ρ‹Ρ‚ΡŒ ΠΏΠΎΠ»Π΅Π·Π½Ρ‹ΠΌ, Π±Π΅Π·ΠΎΠ±ΠΈΠ΄Π½Ρ‹ΠΌ ΠΈ чСстным. Π’Ρ‹ ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅ΡˆΡŒ Π½Π° Ρ‚ΠΎΠΌ языкС, Π½Π° ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΌ Π±Ρ‹Π» Π·Π°Π΄Π°Π½ вопрос ΠΈΠ»ΠΈ попросил ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒ.

Answer in the following format:
<think>Reasoning: ...</think>
...""",
    "task": "Π’Ρ‹ Π²ΠΈΡ€Ρ‚ΡƒΠ°Π»ΡŒΠ½Ρ‹ΠΉ ассистСнт. Π’Ρ‹ ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅ΡˆΡŒ Π½Π° вопросы людСй, помогаСшь ΠΈΠΌ ΠΈ ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΈΠ²Π°Π΅ΡˆΡŒ. Π’Ρ‹ создан, Ρ‡Ρ‚ΠΎΠ±Ρ‹ Π±Ρ‹Ρ‚ΡŒ ΠΏΠΎΠ»Π΅Π·Π½Ρ‹ΠΌ, Π±Π΅Π·ΠΎΠ±ΠΈΠ΄Π½Ρ‹ΠΌ ΠΈ чСстным. Π’Ρ‹ ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅ΡˆΡŒ Π½Π° Ρ‚ΠΎΠΌ языкС, Π½Π° ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΌ Π±Ρ‹Π» Π·Π°Π΄Π°Π½ вопрос ΠΈΠ»ΠΈ попросил ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒ. РСши Π·Π°Π΄Π°Ρ‡Ρƒ ΠΏΠΎ инструкции Π½ΠΈΠΆΠ΅. НС извиняйся, Π½Π΅ строй Π΄ΠΈΠ°Π»ΠΎΠ³.",
    "task_think": """Π’Ρ‹ Π²ΠΈΡ€Ρ‚ΡƒΠ°Π»ΡŒΠ½Ρ‹ΠΉ ассистСнт. Π’Ρ‹ ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅ΡˆΡŒ Π½Π° вопросы людСй, помогаСшь ΠΈΠΌ ΠΈ ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΈΠ²Π°Π΅ΡˆΡŒ. Π’Ρ‹ создан, Ρ‡Ρ‚ΠΎΠ±Ρ‹ Π±Ρ‹Ρ‚ΡŒ ΠΏΠΎΠ»Π΅Π·Π½Ρ‹ΠΌ, Π±Π΅Π·ΠΎΠ±ΠΈΠ΄Π½Ρ‹ΠΌ ΠΈ чСстным. Π’Ρ‹ ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅ΡˆΡŒ Π½Π° Ρ‚ΠΎΠΌ языкС, Π½Π° ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΌ Π±Ρ‹Π» Π·Π°Π΄Π°Π½ вопрос ΠΈΠ»ΠΈ попросил ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒ. РСши Π·Π°Π΄Π°Ρ‡Ρƒ ΠΏΠΎ инструкции Π½ΠΈΠΆΠ΅. НС извиняйся, Π½Π΅ строй Π΄ΠΈΠ°Π»ΠΎΠ³.

Answer in the following format:
<think>Reasoning: ...</think>
...""",
     "english_generic": """You are Mistral Small 3, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
Your knowledge base was last updated on 2023-10-01. The current date is 2025-01-30.
When you're not sure about some information, you say that you don't have the information and don't make up anything.
If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. \"What are some good restaurants around me?\" => \"Where are you?\" or \"When is the next flight to Tokyo\" => \"Where do you travel from?\")
""",
     "english_think": """You are Mistral Small 3, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
Your knowledge base was last updated on 2023-10-01. The current date is 2025-01-30.
When you're not sure about some information, you say that you don't have the information and don't make up anything.
If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. \"What are some good restaurants around me?\" => \"Where are you?\" or \"When is the next flight to Tokyo\" => \"Where do you travel from?\")

Answer in the following format:
<think>Reasoning: ...</think>
""",
}

vLLM

We recommend using this model with the vLLM library to implement production-ready inference pipelines.

Note 1: We recommond using a relatively low temperature, such as temperature=0.15.

Note 2: Make sure to add a system prompt to the model to best tailer it for your needs. If you want to use the model as a general assistant, we recommend the following system prompt:

system_prompt = """You are Mistral Small 3, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
Your knowledge base was last updated on 2023-10-01. The current date is 2025-01-30.
When you're not sure about some information, you say that you don't have the information and don't make up anything.
If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. \"What are some good restaurants around me?\" => \"Where are you?\" or \"When is the next flight to Tokyo\" => \"Where do you travel from?\")

Note 3: flash_attn or flashinfer-python preferred for better performance.

Installation

Make sure you install vLLM >= 0.8.4:

pip install --upgrade vllm

Also make sure you have mistral_common >= 1.5.4 installed:

pip install --upgrade mistral_common

You can also make use of a ready-to-go docker image or on the docker hub.

Server

We recommand that you use ZeroAgency/Zero-Mistral-24B in a server/client setting.

  1. Spin up a server:
vllm serveZeroAgency/Zero-Mistral-24B --enable-prefix-caching --dtype bfloat16 --max-model-len 32768 --tool-call-parser mistral --enable-auto-tool-choice

Note: Running Zero-Mistral-24B on GPU requires ~55 GB of GPU RAM in bf16 or fp16.

  1. To ping the client you can use a simple Python snippet.
import requests
import json
from datetime import datetime, timedelta

url = "http://<your-server>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "ZeroAgency/Zero-Mistral-24B"

messages = [
    {
        "role": "system",
        "content": """Π’Ρ‹ Π²ΠΈΡ€Ρ‚ΡƒΠ°Π»ΡŒΠ½Ρ‹ΠΉ ассистСнт. Π’Ρ‹ ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅ΡˆΡŒ Π½Π° вопросы людСй, помогаСшь ΠΈΠΌ ΠΈ ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΈΠ²Π°Π΅ΡˆΡŒ. Π’Ρ‹ создан, Ρ‡Ρ‚ΠΎΠ±Ρ‹ Π±Ρ‹Ρ‚ΡŒ ΠΏΠΎΠ»Π΅Π·Π½Ρ‹ΠΌ, Π±Π΅Π·ΠΎΠ±ΠΈΠ΄Π½Ρ‹ΠΌ ΠΈ чСстным. Π’Ρ‹ ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅ΡˆΡŒ Π½Π° Ρ‚ΠΎΠΌ языкС, Π½Π° ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΌ Π±Ρ‹Π» Π·Π°Π΄Π°Π½ вопрос ΠΈΠ»ΠΈ попросил ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒ. РСши Π·Π°Π΄Π°Ρ‡Ρƒ ΠΏΠΎ инструкции Π½ΠΈΠΆΠ΅. НС извиняйся, Π½Π΅ строй Π΄ΠΈΠ°Π»ΠΎΠ³.

Answer in the following format:
<think>Reasoning: ...</think>
..."""
    },
    { # Task from https://3.shkolkovo.online/catalog/2552/93150
        "role": "user",
        "content": """ΠŸΠ΅Ρ€Π²Ρ‹ΠΉ Ρ€Π°Π±ΠΎΡ‡ΠΈΠΉ Π·Π° час Π΄Π΅Π»Π°Π΅Ρ‚ Π½Π° 9 Π΄Π΅Ρ‚Π°Π»Π΅ΠΉ большС, Ρ‡Π΅ΠΌ Π²Ρ‚ΠΎΡ€ΠΎΠΉ, ΠΈ выполняСт Π·Π°ΠΊΠ°Π·, состоящий ΠΈΠ· 216 Π΄Π΅Ρ‚Π°Π»Π΅ΠΉ, Π½Π° 4 часа быстрСС, Ρ‡Π΅ΠΌ Π²Ρ‚ΠΎΡ€ΠΎΠΉ Ρ€Π°Π±ΠΎΡ‡ΠΈΠΉ, Π²Ρ‹ΠΏΠΎΠ»Π½ΡΡŽΡ‰ΠΈΠΉ Ρ‚Π°ΠΊΠΎΠΉ ΠΆΠ΅ Π·Π°ΠΊΠ°Π·. Бколько Π΄Π΅Ρ‚Π°Π»Π΅ΠΉ Π² час Π΄Π΅Π»Π°Π΅Ρ‚ ΠΏΠ΅Ρ€Π²Ρ‹ΠΉ Ρ€Π°Π±ΠΎΡ‡ΠΈΠΉ?"""
    },
]

data = {"model": model, "messages": messages}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])

#<think> ΠŸΡƒΡΡ‚ΡŒ x β€” количСство Π΄Π΅Ρ‚Π°Π»Π΅ΠΉ, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ Π΄Π΅Π»Π°Π΅Ρ‚ Π²Ρ‚ΠΎΡ€ΠΎΠΉ Ρ€Π°Π±ΠΎΡ‡ΠΈΠΉ Π·Π° час. Π’ΠΎΠ³Π΄Π° ΠΏΠ΅Ρ€Π²Ρ‹ΠΉ Ρ€Π°Π±ΠΎΡ‡ΠΈΠΉ Π΄Π΅Π»Π°Π΅Ρ‚ x + 9 Π΄Π΅Ρ‚Π°Π»Π΅ΠΉ Π·Π° час. Боставим Ρ‚Π°Π±Π»ΠΈΡ†Ρƒ: ΠŸΠ΅Ρ€Π²Ρ‹ΠΉ Ρ€Π°Π±ΠΎΡ‡ΠΈΠΉ Π’Ρ‚ΠΎΡ€ΠΎΠΉ Ρ€Π°Π±ΠΎΡ‡ΠΈΠΉ ΠšΠΎΠ»ΠΈΡ‡Π΅ΡΡ‚Π²ΠΎ Π΄Π΅Ρ‚Π°Π»Π΅ΠΉ Π² час x + 9 x ΠšΠΎΠ»ΠΈΡ‡Π΅ΡΡ‚Π²ΠΎ часов 216 : (x + 9) 216 : x Π Π°Π·Π½ΠΎΡΡ‚ΡŒ количСства часов 4 216 : (x + 9) βˆ’ 216 : x = 4 216x βˆ’ 216(x + 9) = 4x(x + 9) 216x βˆ’ 216x βˆ’ 1944 = 4x^2 + 36x 1944 = 4x^2 + 36x 4x^2 + 36x βˆ’ 1944 = 0 D = 36^2 + 4 Β· 4 Β· 1944 = 1296 + 31104 = 32400 = 180^2 x1 = βˆ’36 + 180 : 8 = 144 : 8 = 18 x2 = βˆ’36 βˆ’ 180 : 8 < 0 β€” Π½Π΅ ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ΠΈΡ‚ ΠΏΠΎ смыслу Π·Π°Π΄Π°Ρ‡ΠΈ. Π’ΠΎΠ³Π΄Π° ΠΏΠ΅Ρ€Π²Ρ‹ΠΉ Ρ€Π°Π±ΠΎΡ‡ΠΈΠΉ Π΄Π΅Π»Π°Π΅Ρ‚ 18 + 9 = 27 Π΄Π΅Ρ‚Π°Π»Π΅ΠΉ Π² час. </think>
#27

Offline

from vllm import LLM
from vllm.sampling_params import SamplingParams
from datetime import datetime, timedelta


# note that running this model on GPU requires over 60 GB of GPU RAM
llm = LLM(model="ZeroAgency/Zero-Mistral-24B", tokenizer_mode="mistral", tensor_parallel_size=8)

SYSTEM_PROMPT = """Π’Ρ‹ Π²ΠΈΡ€Ρ‚ΡƒΠ°Π»ΡŒΠ½Ρ‹ΠΉ ассистСнт. Π’Ρ‹ ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅ΡˆΡŒ Π½Π° вопросы людСй, помогаСшь ΠΈΠΌ ΠΈ ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΈΠ²Π°Π΅ΡˆΡŒ. Π’Ρ‹ создан, Ρ‡Ρ‚ΠΎΠ±Ρ‹ Π±Ρ‹Ρ‚ΡŒ ΠΏΠΎΠ»Π΅Π·Π½Ρ‹ΠΌ, Π±Π΅Π·ΠΎΠ±ΠΈΠ΄Π½Ρ‹ΠΌ ΠΈ чСстным. Π’Ρ‹ ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅ΡˆΡŒ Π½Π° Ρ‚ΠΎΠΌ языкС, Π½Π° ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΌ Π±Ρ‹Π» Π·Π°Π΄Π°Π½ вопрос ΠΈΠ»ΠΈ попросил ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒ.

Answer in the following format:
<think>Reasoning: ...</think>
..."""

user_prompt = """Π§Ρ‚ΠΎ большС 9.9 ΠΈΠ»ΠΈ 9.11?"""

messages = [
    {
        "role": "system",
        "content": SYSTEM_PROMPT
    },
    {
        "role": "user",
        "content": user_prompt
    },
]


sampling_params = SamplingParams(max_tokens=512, temperature=0.0, top_p=1, top_k=-1)
outputs = llm.chat(messages, sampling_params=sampling_params)


print(outputs[0].outputs[0].text)
#<think> Π—Π°Π΄Π°Ρ‡Π°: Π‘Ρ€Π°Π²Π½ΠΈΡ‚Π΅ 9.9 ΠΈ 9.11 для опрСдСлСния Ρ‚ΠΎΠ³ΠΎ, ΠΊΠ°ΠΊΠΎΠΉ ΠΈΠ· Π½ΠΈΡ… большС ΠŸΠΎΠ΄Ρ…ΠΎΠ΄: ДСсятичноС сравнСниС с Π²Ρ‹Ρ€Π°Π²Π½ΠΈΠ²Π°Π½ΠΈΠ΅ΠΌ дСсятичных Ρ‚ΠΎΡ‡Π΅ΠΊ Π‘Π»ΠΎΠΆΠ½ΠΎΡΡ‚ΡŒ: Низкий ΠΊ срСднСму Π― Π΄ΠΎΠ»ΠΆΠ΅Π½ Ρ‚Ρ‰Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ Π²Ρ‹Ρ€ΠΎΠ²Π½ΡΡ‚ΡŒ дСсятичныС Ρ‚ΠΎΡ‡ΠΊΠΈ ΠΈ ΡΡ€Π°Π²Π½ΠΈΡ‚ΡŒ Ρ†ΠΈΡ„Ρ€Ρ‹ ΠΏΠΎ мСсту. 1. Π’Ρ‹Ρ€ΠΎΠ²Π½ΡΡ‚ΡŒ дСсятичныС Ρ‚ΠΎΡ‡ΠΊΠΈ: 9.90 9.11 2. Π‘Ρ€Π°Π²Π½ΠΈΡ‚Π΅ Ρ†Π΅Π»Ρ‹Π΅ числа: ΠΎΠ±Π° ΠΈΠΌΠ΅ΡŽΡ‚ 9, поэтому ΠΎΠ½ΠΈ Ρ€Π°Π²Π½Ρ‹ 3. Π‘Ρ€Π°Π²Π½ΠΈΡ‚Π΅ дСсятыС мСста: 9.90 ΠΈΠΌΠ΅Π΅Ρ‚ 9, 9.11 ΠΈΠΌΠ΅Π΅Ρ‚ 1 9 &gt; 1, поэтому 9.90 большС 4. Π‘Ρ€Π°Π²Π½ΠΈΡ‚Π΅ сотыС мСста: 9.90 ΠΈΠΌΠ΅Π΅Ρ‚ 0, 9.11 ΠΈΠΌΠ΅Π΅Ρ‚ 1 0 &lt; 1, Π½ΠΎ это Π½Π΅ ΠΈΠΌΠ΅Π΅Ρ‚ значСния, ΠΏΠΎΡΠΊΠΎΠ»ΡŒΠΊΡƒ дСсятоС мСсто ΡƒΠΆΠ΅ ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΠ»ΠΎ большСС число<reflection>Π― ΠΏΡ€Π°Π²ΠΈΠ»ΡŒΠ½ΠΎ выровнял дСсятичныС Ρ‚ΠΎΡ‡ΠΊΠΈ ΠΈ сравнил Ρ†ΠΈΡ„Ρ€Ρ‹ ΠΏΠΎ мСсту. Π― Π·Π°ΠΌΠ΅Ρ‚ΠΈΠ», Ρ‡Ρ‚ΠΎ дСсятоС мСсто (9 ΠΏΡ€ΠΎΡ‚ΠΈΠ² 1) ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΠ»ΠΎ, Ρ‡Ρ‚ΠΎ 9.9 большС, Ρ‡Π΅ΠΌ 9.11. Π‘ΠΎΡ‚Ρ‹Π΅ мСста Π½Π΅ Π±Ρ‹Π»ΠΈ Π½Π΅ΠΎΠ±Ρ…ΠΎΠ΄ΠΈΠΌΡ‹ для этого сравнСния.</reflection> <self_improvement>Π’ Π±ΡƒΠ΄ΡƒΡ‰ΠΈΡ… сравнСниях я Π±ΡƒΠ΄Ρƒ ΡƒΠ΄Π΅Π»ΡΡ‚ΡŒ ΠΏΠ΅Ρ€Π²ΠΎΠΎΡ‡Π΅Ρ€Π΅Π΄Π½ΠΎΠ΅ Π²Π½ΠΈΠΌΠ°Π½ΠΈΠ΅ самым Π»Π΅Π²Ρ‹ΠΌ Ρ†ΠΈΡ„Ρ€Π°ΠΌ, Π³Π΄Π΅ Π΅ΡΡ‚ΡŒ Ρ€Π°Π·Π½ΠΈΡ†Π°, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΠΎΠΏΡ‚ΠΈΠΌΠΈΠ·ΠΈΡ€ΠΎΠ²Π°Ρ‚ΡŒ процСсс сравнСния.</self_improvement> </think>  9.9 большС, Ρ‡Π΅ΠΌ 9.11. Когда Π²Ρ‹ сравниваСтС дСсятичныС числа, Π²Ρ‹ Π½Π°Ρ‡ΠΈΠ½Π°Π΅Ρ‚Π΅ с Ρ†Π΅Π»Ρ‹Ρ… чисСл, Π·Π°Ρ‚Π΅ΠΌ ΠΏΠ΅Ρ€Π΅Ρ…ΠΎΠ΄ΠΈΡ‚Π΅ ΠΊ дСсятым мСстам, сотым мСстам ΠΈ Ρ‚Π°ΠΊ Π΄Π°Π»Π΅Π΅. Π’ этом случаС 9.9 ΠΈΠΌΠ΅Π΅Ρ‚ 9 Π² дСсятом мСстС, Π² Ρ‚ΠΎ врСмя ΠΊΠ°ΠΊ 9.11 ΠΈΠΌΠ΅Π΅Ρ‚ 1 Π² дСсятом мСстС. ΠŸΠΎΡΠΊΠΎΠ»ΡŒΠΊΡƒ 9 &gt; 1, 9.9 большС, Ρ‡Π΅ΠΌ 9.11. 

Transformers

If you want to use Hugging Face transformers to generate text, you can do something like this.

from transformers import pipeline
import torch

messages = [
    {"role": "user", "content": "Π§Ρ‚ΠΎ большС 9.9 ΠΈΠ»ΠΈ 9.11?"},
]
chatbot = pipeline("text-generation", model="ZeroAgency/Zero-Mistral-24B", max_new_tokens=256, torch_dtype=torch.bfloat16)
response = chatbot(messages, temperature=0.1)
print(response[0]['generated_text'][1]['content'])
# 9.9 большС, Ρ‡Π΅ΠΌ 9.11.

llama-server

You can run llama-server - OpenAI compatible server for serving GGUF version of model.

Example of running with docker container:

docker run --gpus all -v `pwd`:/mnt -p8000:8000 ghcr.io/ggml-org/llama.cpp:server-cuda  -fa --port 8000 --host 0.0.0.0 --temp 0.0 --jinja -ngl 100 --api-key DUMMY-API-KEY -m /mnt/Zero-Mistral-24B-Q4_K_M_L.gguf

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: 8x H200
  • Hours used: 29.5
  • Cloud Provider: Runpod
  • Compute Region: US-DE
  • Carbon Emitted: Β―\_(ツ)_/Β―
Downloads last month
41
Safetensors
Model size
23.6B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ZeroAgency/Zero-Mistral-24B

Dataset used to train ZeroAgency/Zero-Mistral-24B

Collection including ZeroAgency/Zero-Mistral-24B