Random-Llama-Small

Model Overview

Random-Llama-Small is a randomly initialized transformer-based language model with approximately 2 billion parameters, built using the LLaMA architecture. It is designed for research purposes, providing a starting point for pretraining or fine-tuning on custom datasets. The model uses the tokenizer from HuggingFaceTB/SmolLM2-1.7B-Instruct and is configured for causal language modeling. As a randomly initialized model, it produces incoherent outputs until trained, making it ideal for researchers studying transformer training dynamics or developing custom language models.


Key Details

  • Architecture: LLaMA (Causal Language Model)
  • Parameters: ~2B
  • Hidden Size: 2304
  • Layers: 22
  • Attention Heads: 36 (with 9 key-value heads for grouped-query attention)
  • Intermediate Size: 9216
  • Vocabulary Size: 128256
  • Tokenizer: Imported from HuggingFaceTB/SmolLM2-1.7B-Instruct
  • Precision: bfloat16
  • Max Context Length: 131,072 tokens (with RoPE scaling)
  • License: MIT

LLaMA Architecture

The LLaMA architecture, developed by Meta AI, is a family of efficient transformer-based models optimized for research. Random-Llama-Small follows this design, incorporating several key features:

Core Components

  • Decoder-Only Transformer: Predicts the next token in a sequence based on prior tokens, suitable for autoregressive tasks like text generation.
  • Grouped-Query Attention (GQA): 36 attention heads with only 9 key-value heads, improving efficiency and reducing memory/compute cost.
  • Rotary Position Embeddings (RoPE): Embeds positional information with scaling, enabling a context length of up to 131,072 tokens.
  • Swiglu Activation: Uses SiLU (Swish) activation in the FFN for improved expressiveness.
  • RMSNorm: Root Mean Square Layer Normalization replaces LayerNorm for stability and faster convergence.
  • Tied Embeddings: Input and output embeddings share weights (tie_word_embeddings=True), reducing parameter count by ~295M.

Benefits of LLaMA Architecture

  • Efficiency: High throughput, low memory use.
  • Scalability: Works well across model sizes.
  • Flexibility: Long-context support and task adaptability.
  • Research-Friendly: Great for exploring attention, positional encoding, and training dynamics.

Random-Llama-Small Specifics

This model uses random weights and:

  • Has ~2B parameters across 22 layers.
  • Uses a 2304 hidden size and 9216 FFN size.
  • Supports 128K+ vocab tokens and bfloat16 precision.
  • Supports extended context lengths of 131,072 tokens.

Intended Use

  • Research on transformer dynamics, optimization, or architectural changes.
  • Baseline for pretraining or task-specific fine-tuning.
  • Experimentation with scaling laws or custom architectures.

Out-of-Scope Use

  • Not for direct production deployment.
  • Not suitable for tasks needing coherence or accuracy without training.

Usage

Requirements

  • transformers >= 4.45.0
  • torch >= 2.0
  • GPU with ≥ 6GB VRAM (24GB+ for training)

Inference Example

# Use a pipeline as a high-level helper
from transformers import pipeline

messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="reflex-ai/random-llama-small")
print(pipe(messages))

Note: Outputs will be random and incoherent due to the model’s untrained state.


Training Example

from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling, LlamaForCausalLM, AutoTokenizer

model = LlamaForCausalLM.from_pretrained("your_username/random-llama-small")
tokenizer = AutoTokenizer.from_pretrained("your_username/random-llama-small")

training_args = TrainingArguments(
    output_dir="./random_llama_small_finetuned",
    per_device_train_batch_size=4,
    num_train_epochs=3,
    fp16=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=your_dataset,
    data_collator=DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False),
)

trainer.train()

Limitations

  • Random Initialization: Needs significant training to be useful.
  • Resource Intensive: High computational cost.
  • No Pretraining Data: Users must provide their own.
  • Tokenizer Constraint: May not suit all domains.

Benefits and Potential

  • Customizability: A blank slate for full control of objectives and data.
  • Research Insights: Ideal for understanding early-stage LLM behavior.
  • Scalable Baseline: Balances size and research feasibility.
  • Extended Context: Useful for long-form tasks post-training.

Model Configuration

{
  "architectures": ["LlamaForCausalLM"],
  "hidden_size": 2304,
  "num_hidden_layers": 22,
  "num_attention_heads": 36,
  "num_key_value_heads": 9,
  "intermediate_size": 9216,
  "vocab_size": 128256,
  "max_position_embeddings": 131072,
  "rope_scaling": {
    "factor": 32.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "torch_dtype": "bfloat16",
  "tie_word_embeddings": true
}

Ethical Considerations

  • Untrained Safety: No immediate harmful outputs, but ethics matter during training.
  • Environmental Impact: Large-scale training consumes energy; optimize and use green compute.
  • Accessibility: Resource requirements may limit use by smaller research teams.

Contact

For questions or issues, please open an issue on the Hugging Face repository.

Model card created on April 20, 2025.

Downloads last month
21
Safetensors
Model size
1.99B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including reflex-ai/random-llama-small