File size: 7,243 Bytes
ad2cbc5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
---
license: apache-2.0 # Base model license
language: en
library_name: peft
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
tags:
- question-answering
- lora
- qlora
- tinyllama
- generated-data
---
# TinyLlama-1.1B Offline Practical Skills QA Adapter (QLoRA)
**Adapter created by:** Cahlen Humphreys
This repository contains a LoRA (Low-Rank Adaptation) adapter fine-tuned for Question Answering on practical knowledge topics (e.g., survival, first aid, basic maintenance), using the `TinyLlama/TinyLlama-1.1B-Chat-v1.0` base model. The goal is to provide helpful information potentially useful in offline or edge AI scenarios.
This adapter was trained using QLoRA (Quantized Low-Rank Adaptation), allowing efficient fine-tuning on consumer hardware.
**Note:** This adapter was created as part of a tutorial demonstrating QLoRA fine-tuning. It is intended for educational and demonstrative purposes.
## Model Description
* **Base Model:** [`TinyLlama/TinyLlama-1.1B-Chat-v1.0`](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
* **Adapter:** LoRA weights trained using the `peft` library.
* **Fine-tuning Method:** QLoRA (4-bit `nf4` quantization of the base model via `bitsandbytes`, training only the added LoRA weights).
* **Training Data:** A **synthetically generated** QA dataset covering topics like Wilderness Survival, Basic First Aid, Simple Car Maintenance, etc. (See accompanying dataset card for details). **This data has not been human-verified.**
## Intended Uses & Limitations
**Intended Use:**
* To enhance the QA capabilities of the TinyLlama-1.1B-Chat model specifically for the practical knowledge topics covered in the training data.
* As an educational example of QLoRA fine-tuning and adapter usage.
* For qualitative comparison against the base model's performance on the target domains.
* Potentially suitable for integration into offline/edge AI systems where access to practical information is needed without internet connectivity (given the small base model size and focused dataset).
**Limitations:**
* **Domain Specificity:** Performance is expected to be best on questions closely related to the training topics. It may not perform well on out-of-domain questions.
* **Based on Synthetic Data:** The adapter's knowledge is **derived entirely from AI-generated data**, which inherently contains potential inaccuracies, biases, or artifacts. **Answers should not be trusted without independent verification, especially for critical information.** Fact-checking is essential.
* **Not for Production:** This adapter is a result of a tutorial process and **has not undergone rigorous testing or optimization for production deployment.** It is provided as-is for educational purposes. Performance and safety in real-world offline scenarios are not guaranteed.
* **Base Model Limitations:** Inherits limitations and potential biases of the base TinyLlama model.
## How to Get Started
You can load the base model and apply this LoRA adapter using the `transformers` and `peft` libraries.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Define paths (replace with your Hub repo ID if uploaded)
base_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
adapter_path = "path/to/your/results_tinyllama_adapter" # Or Hub ID like "YourUsername/YourRepoName"
# Load the base model (quantized, as used during training)
# Ensure you have bitsandbytes installed
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
load_in_4bit=True,
torch_dtype=torch.float16, # Or torch.bfloat16 if supported
device_map="auto",
)
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# Load the LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_path)
# --- Now you can use the 'model' for inference ---
# Example: Prepare prompt (adapt based on training format)
topic = "Wilderness Survival Basics"
question = "How do you signal for help using a mirror?"
system_prompt = f"You are a helpful assistant knowledgeable about {topic}."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": f"Question: {question}"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.6,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Post-process response to extract answer if needed (logic depends on template)
print(response)
```
## Training Details
* **Training Script:** `src/train.py` (from accompanying repository)
* **Dataset:** `data/final_qa_unique.jsonl` (~2845 examples)
* **Epochs:** 3
* **Learning Rate:** 2e-4
* **Batch Size (effective):** 8 (per_device_train_batch_size=4, gradient_accumulation_steps=2)
* **Optimizer:** paged_adamw_32bit
* **Precision:** fp16 mixed precision
* **QLoRA Config:** 4-bit nf4 quantization, compute_dtype=float16, double_quant=False.
* **LoRA Config:** r=64, alpha=16, dropout=0.1, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
## Evaluation Results
Qualitative testing was performed by comparing the adapter's responses to the base model's responses on questions from the training domains.
**Findings:**
* The adapter generally produced more focused, concise, and relevant answers for the target topics (e.g., mirror signaling, first aid steps) compared to the base model.
* The base model was more prone to hallucination or providing nonsensical/repetitive answers on these specific topics.
* For some factual recall questions (e.g., precise definition of the Rule of 3s), neither model performed perfectly, highlighting the dependence on the quality and coverage of the synthetic training data.
* In some cases (e.g., frequency of checking engine oil), the base model provided a more direct answer, while the adapter answered a related concept (replacement frequency).
Overall, the adapter shows clear specialization towards the trained QA domains, but its accuracy is tied to the underlying training data.
## Disclaimer
**This LoRA adapter is provided strictly for educational and research demonstration purposes.** It was trained on synthetically generated data and has not undergone rigorous safety testing or evaluation for production use. **The creator, Cahlen Humphreys, assumes no responsibility or liability for any consequences, damages, or issues arising from the use, interpretation, or application of this model adapter.** This includes, but is not limited to, use in production systems, decision-making processes, safety-critical applications, or any situation where incorrect information could cause harm. **Use this adapter entirely at your own risk** and be aware of potential inaccuracies or biases inherited from the base model and the unverified synthetic training data. |