TinyLlama-1.1B Offline Practical Skills QA Adapter (QLoRA)

Adapter created by: Cahlen Humphreys

This repository contains a LoRA (Low-Rank Adaptation) adapter fine-tuned for Question Answering on practical knowledge topics (e.g., survival, first aid, basic maintenance), using the TinyLlama/TinyLlama-1.1B-Chat-v1.0 base model. The goal is to provide helpful information potentially useful in offline or edge AI scenarios.

This adapter was trained using QLoRA (Quantized Low-Rank Adaptation), allowing efficient fine-tuning on consumer hardware.

Note: This adapter was created as part of a tutorial demonstrating QLoRA fine-tuning. It is intended for educational and demonstrative purposes.

Model Description

Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Adapter: LoRA weights trained using the peft library.
Fine-tuning Method: QLoRA (4-bit nf4 quantization of the base model via bitsandbytes, training only the added LoRA weights).
Training Data: A synthetically generated QA dataset covering topics like Wilderness Survival, Basic First Aid, Simple Car Maintenance, etc. (See accompanying dataset card for details). This data has not been human-verified.

Intended Uses & Limitations

Intended Use:

To enhance the QA capabilities of the TinyLlama-1.1B-Chat model specifically for the practical knowledge topics covered in the training data.
As an educational example of QLoRA fine-tuning and adapter usage.
For qualitative comparison against the base model's performance on the target domains.
Potentially suitable for integration into offline/edge AI systems where access to practical information is needed without internet connectivity (given the small base model size and focused dataset).

Limitations:

Domain Specificity: Performance is expected to be best on questions closely related to the training topics. It may not perform well on out-of-domain questions.
Based on Synthetic Data: The adapter's knowledge is derived entirely from AI-generated data, which inherently contains potential inaccuracies, biases, or artifacts. Answers should not be trusted without independent verification, especially for critical information. Fact-checking is essential.
Not for Production: This adapter is a result of a tutorial process and has not undergone rigorous testing or optimization for production deployment. It is provided as-is for educational purposes. Performance and safety in real-world offline scenarios are not guaranteed.
Base Model Limitations: Inherits limitations and potential biases of the base TinyLlama model.

How to Get Started

You can load the base model and apply this LoRA adapter using the transformers and peft libraries.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Define paths (replace with your Hub repo ID if uploaded)
base_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
adapter_path = "path/to/your/results_tinyllama_adapter" # Or Hub ID like "YourUsername/YourRepoName"

# Load the base model (quantized, as used during training)
# Ensure you have bitsandbytes installed
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    load_in_4bit=True,
    torch_dtype=torch.float16, # Or torch.bfloat16 if supported
    device_map="auto",
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load the LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_path)

# --- Now you can use the 'model' for inference --- 

# Example: Prepare prompt (adapt based on training format)
topic = "Wilderness Survival Basics"
question = "How do you signal for help using a mirror?"
system_prompt = f"You are a helpful assistant knowledgeable about {topic}."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": f"Question: {question}"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=100,
        temperature=0.6,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Post-process response to extract answer if needed (logic depends on template)
print(response)

Training Details

Training Script: src/train.py (from accompanying repository)
Dataset: data/final_qa_unique.jsonl (~2845 examples)
Epochs: 3
Learning Rate: 2e-4
Batch Size (effective): 8 (per_device_train_batch_size=4, gradient_accumulation_steps=2)
Optimizer: paged_adamw_32bit
Precision: fp16 mixed precision
QLoRA Config: 4-bit nf4 quantization, compute_dtype=float16, double_quant=False.
LoRA Config: r=64, alpha=16, dropout=0.1, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

Evaluation Results

Qualitative testing was performed by comparing the adapter's responses to the base model's responses on questions from the training domains.

Findings:

The adapter generally produced more focused, concise, and relevant answers for the target topics (e.g., mirror signaling, first aid steps) compared to the base model.
The base model was more prone to hallucination or providing nonsensical/repetitive answers on these specific topics.
For some factual recall questions (e.g., precise definition of the Rule of 3s), neither model performed perfectly, highlighting the dependence on the quality and coverage of the synthetic training data.
In some cases (e.g., frequency of checking engine oil), the base model provided a more direct answer, while the adapter answered a related concept (replacement frequency).

Overall, the adapter shows clear specialization towards the trained QA domains, but its accuracy is tied to the underlying training data.

Disclaimer

This LoRA adapter is provided strictly for educational and research demonstration purposes. It was trained on synthetically generated data and has not undergone rigorous safety testing or evaluation for production use. The creator, Cahlen Humphreys, assumes no responsibility or liability for any consequences, damages, or issues arising from the use, interpretation, or application of this model adapter. This includes, but is not limited to, use in production systems, decision-making processes, safety-critical applications, or any situation where incorrect information could cause harm. Use this adapter entirely at your own risk and be aware of potential inaccuracies or biases inherited from the base model and the unverified synthetic training data.

cahlen
/

tinyllama-offline-practical-skills-qa-qlora