reflex-ai
/

random-llama-small

Text Generation

text-generation-inference

Model card Files Files and versions Community

Random-Llama-Small

Model Overview

Random-Llama-Small is a randomly initialized transformer-based language model with approximately 2 billion parameters, built using the LLaMA architecture. It is designed for research purposes, providing a starting point for pretraining or fine-tuning on custom datasets. The model uses the tokenizer from HuggingFaceTB/SmolLM2-1.7B-Instruct and is configured for causal language modeling. As a randomly initialized model, it produces incoherent outputs until trained, making it ideal for researchers studying transformer training dynamics or developing custom language models.

Key Details

Architecture: LLaMA (Causal Language Model)
Parameters: ~2B
Hidden Size: 2304
Layers: 22
Attention Heads: 36 (with 9 key-value heads for grouped-query attention)
Intermediate Size: 9216
Vocabulary Size: 128256
Tokenizer: Imported from HuggingFaceTB/SmolLM2-1.7B-Instruct
Precision: bfloat16
Max Context Length: 131,072 tokens (with RoPE scaling)
License: MIT

LLaMA Architecture

The LLaMA architecture, developed by Meta AI, is a family of efficient transformer-based models optimized for research. Random-Llama-Small follows this design, incorporating several key features:

Core Components

Decoder-Only Transformer: Predicts the next token in a sequence based on prior tokens, suitable for autoregressive tasks like text generation.
Grouped-Query Attention (GQA): 36 attention heads with only 9 key-value heads, improving efficiency and reducing memory/compute cost.
Rotary Position Embeddings (RoPE): Embeds positional information with scaling, enabling a context length of up to 131,072 tokens.
Swiglu Activation: Uses SiLU (Swish) activation in the FFN for improved expressiveness.
RMSNorm: Root Mean Square Layer Normalization replaces LayerNorm for stability and faster convergence.
Tied Embeddings: Input and output embeddings share weights (tie_word_embeddings=True), reducing parameter count by ~295M.

Benefits of LLaMA Architecture

Efficiency: High throughput, low memory use.
Scalability: Works well across model sizes.
Flexibility: Long-context support and task adaptability.
Research-Friendly: Great for exploring attention, positional encoding, and training dynamics.

Random-Llama-Small Specifics

This model uses random weights and:

Has ~2B parameters across 22 layers.
Uses a 2304 hidden size and 9216 FFN size.
Supports 128K+ vocab tokens and bfloat16 precision.
Supports extended context lengths of 131,072 tokens.

Intended Use

Research on transformer dynamics, optimization, or architectural changes.
Baseline for pretraining or task-specific fine-tuning.
Experimentation with scaling laws or custom architectures.

Out-of-Scope Use

Not for direct production deployment.
Not suitable for tasks needing coherence or accuracy without training.

Usage

Requirements

transformers >= 4.45.0
torch >= 2.0
GPU with ≥ 6GB VRAM (24GB+ for training)

Inference Example

# Use a pipeline as a high-level helper
from transformers import pipeline

messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="reflex-ai/random-llama-small")
print(pipe(messages))

Note: Outputs will be random and incoherent due to the model’s untrained state.

Training Example

from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling, LlamaForCausalLM, AutoTokenizer

model = LlamaForCausalLM.from_pretrained("your_username/random-llama-small")
tokenizer = AutoTokenizer.from_pretrained("your_username/random-llama-small")

training_args = TrainingArguments(
    output_dir="./random_llama_small_finetuned",
    per_device_train_batch_size=4,
    num_train_epochs=3,
    fp16=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=your_dataset,
    data_collator=DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False),
)

trainer.train()

Limitations

Random Initialization: Needs significant training to be useful.
Resource Intensive: High computational cost.
No Pretraining Data: Users must provide their own.
Tokenizer Constraint: May not suit all domains.

Benefits and Potential

Customizability: A blank slate for full control of objectives and data.
Research Insights: Ideal for understanding early-stage LLM behavior.
Scalable Baseline: Balances size and research feasibility.
Extended Context: Useful for long-form tasks post-training.

Model Configuration

{
  "architectures": ["LlamaForCausalLM"],
  "hidden_size": 2304,
  "num_hidden_layers": 22,
  "num_attention_heads": 36,
  "num_key_value_heads": 9,
  "intermediate_size": 9216,
  "vocab_size": 128256,
  "max_position_embeddings": 131072,
  "rope_scaling": {
    "factor": 32.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "torch_dtype": "bfloat16",
  "tie_word_embeddings": true
}

Ethical Considerations

Untrained Safety: No immediate harmful outputs, but ethics matter during training.
Environmental Impact: Large-scale training consumes energy; optimize and use green compute.
Accessibility: Resource requirements may limit use by smaller research teams.

Contact

For questions or issues, please open an issue on the Hugging Face repository.

Model card created on April 20, 2025.

Downloads last month: 21

Safetensors

Model size

1.99B params

Tensor type

F32

·

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including reflex-ai/random-llama-small

Randomly Initialized Models

Randomly Initialized Models are machine learning models where the initial parameters, such as weights and biases, are assigned random values. • 5 items • Updated 18 days ago • 1