finki-ukim/VezilkaLLM · Hugging Face

VezilkaLLM is a 4-billion parameter base language model specifically trained for the Macedonian language, built on top of the Gemma 3 4B pretrained foundation model. Developed using a high-quality 1.47B-word Macedonian corpus from LVSTCK, it offers strong fluency and performance on Macedonian NLP tasks while remaining lightweight and efficient for fine-tuning or deployment on modest hardware. The model was trained with the Hugging Face Transformers Trainer API on a single NVIDIA H100 GPU, using a context window of 8192 tokens and optimized for long-context coherence.

As the first model in a new series of Macedonian LLMs, VezilkaLLM sets the foundation for future releases focused on instruction tuning, chat capabilities, reasoning, and domain-specific tasks. As shown in Table 1, despite being smaller than typical multilingual or 7B–8B models, VezilkaLLM matches or outperforms them on Macedonian benchmarks, demonstrating the value of domain-specific training for low-resource languages. It is intended as a base model for further fine-tuning and does not include safety mechanisms or chat-specific tuning out of the box.

Model	ARC Challenge	ARC Easy	BoolQ	HellaSwag	OpenBookQA	PIQA	Winogrande	NQ Open
gemma-3-4b-pt	0.28	0.48	0.75	0.39	0.25	0.62	0.59	0.00
VezilkaLLM	0.30	0.50	0.72	0.41	0.25	0.65	0.59	0.03
domestic-yak-8B	0.31	0.52	0.77	0.43	0.29	0.67	0.63	0.04
MKLLM-7B	0.32	0.54	0.71	0.43	0.28	0.62	0.62	0.03

Table 1: Model Performance Comparison Across Evaluation Bencmarks

We present the evaluation results viusally in Figure 1.

Figure 1: Model Performance Comparison Across Evaluation Bencmarks

You can run inference with VezilkaLLM using the Hugging Face Transformers library as shown below:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_id = "finki-ukim/VezilkaLLM"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

prompt = "Зборот „пријателство“ значи "
outputs = generator(prompt, max_new_tokens=128, do_sample=True, temperature=0.7)

print(outputs[0]["generated_text"])