cahlen commited on
Commit
ad2cbc5
verified
1 Parent(s): f323d56

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +131 -0
README.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0 # Base model license
3
+ language: en
4
+ library_name: peft
5
+ base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
6
+ tags:
7
+ - question-answering
8
+ - lora
9
+ - qlora
10
+ - tinyllama
11
+ - generated-data
12
+ ---
13
+
14
+ # TinyLlama-1.1B Offline Practical Skills QA Adapter (QLoRA)
15
+
16
+ **Adapter created by:** Cahlen Humphreys
17
+
18
+ This repository contains a LoRA (Low-Rank Adaptation) adapter fine-tuned for Question Answering on practical knowledge topics (e.g., survival, first aid, basic maintenance), using the `TinyLlama/TinyLlama-1.1B-Chat-v1.0` base model. The goal is to provide helpful information potentially useful in offline or edge AI scenarios.
19
+
20
+ This adapter was trained using QLoRA (Quantized Low-Rank Adaptation), allowing efficient fine-tuning on consumer hardware.
21
+
22
+ **Note:** This adapter was created as part of a tutorial demonstrating QLoRA fine-tuning. It is intended for educational and demonstrative purposes.
23
+
24
+ ## Model Description
25
+
26
+ * **Base Model:** [`TinyLlama/TinyLlama-1.1B-Chat-v1.0`](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
27
+ * **Adapter:** LoRA weights trained using the `peft` library.
28
+ * **Fine-tuning Method:** QLoRA (4-bit `nf4` quantization of the base model via `bitsandbytes`, training only the added LoRA weights).
29
+ * **Training Data:** A **synthetically generated** QA dataset covering topics like Wilderness Survival, Basic First Aid, Simple Car Maintenance, etc. (See accompanying dataset card for details). **This data has not been human-verified.**
30
+
31
+ ## Intended Uses & Limitations
32
+
33
+ **Intended Use:**
34
+ * To enhance the QA capabilities of the TinyLlama-1.1B-Chat model specifically for the practical knowledge topics covered in the training data.
35
+ * As an educational example of QLoRA fine-tuning and adapter usage.
36
+ * For qualitative comparison against the base model's performance on the target domains.
37
+ * Potentially suitable for integration into offline/edge AI systems where access to practical information is needed without internet connectivity (given the small base model size and focused dataset).
38
+
39
+ **Limitations:**
40
+ * **Domain Specificity:** Performance is expected to be best on questions closely related to the training topics. It may not perform well on out-of-domain questions.
41
+ * **Based on Synthetic Data:** The adapter's knowledge is **derived entirely from AI-generated data**, which inherently contains potential inaccuracies, biases, or artifacts. **Answers should not be trusted without independent verification, especially for critical information.** Fact-checking is essential.
42
+ * **Not for Production:** This adapter is a result of a tutorial process and **has not undergone rigorous testing or optimization for production deployment.** It is provided as-is for educational purposes. Performance and safety in real-world offline scenarios are not guaranteed.
43
+ * **Base Model Limitations:** Inherits limitations and potential biases of the base TinyLlama model.
44
+
45
+ ## How to Get Started
46
+
47
+ You can load the base model and apply this LoRA adapter using the `transformers` and `peft` libraries.
48
+
49
+ ```python
50
+ import torch
51
+ from transformers import AutoModelForCausalLM, AutoTokenizer
52
+ from peft import PeftModel
53
+
54
+ # Define paths (replace with your Hub repo ID if uploaded)
55
+ base_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
56
+ adapter_path = "path/to/your/results_tinyllama_adapter" # Or Hub ID like "YourUsername/YourRepoName"
57
+
58
+ # Load the base model (quantized, as used during training)
59
+ # Ensure you have bitsandbytes installed
60
+ base_model = AutoModelForCausalLM.from_pretrained(
61
+ base_model_name,
62
+ load_in_4bit=True,
63
+ torch_dtype=torch.float16, # Or torch.bfloat16 if supported
64
+ device_map="auto",
65
+ )
66
+
67
+ # Load the tokenizer
68
+ tokenizer = AutoTokenizer.from_pretrained(base_model_name)
69
+ if tokenizer.pad_token is None:
70
+ tokenizer.pad_token = tokenizer.eos_token
71
+
72
+ # Load the LoRA adapter
73
+ model = PeftModel.from_pretrained(base_model, adapter_path)
74
+
75
+ # --- Now you can use the 'model' for inference ---
76
+
77
+ # Example: Prepare prompt (adapt based on training format)
78
+ topic = "Wilderness Survival Basics"
79
+ question = "How do you signal for help using a mirror?"
80
+ system_prompt = f"You are a helpful assistant knowledgeable about {topic}."
81
+
82
+ messages = [
83
+ {"role": "system", "content": system_prompt},
84
+ {"role": "user", "content": f"Question: {question}"}
85
+ ]
86
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
87
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
88
+
89
+ # Generate
90
+ with torch.no_grad():
91
+ outputs = model.generate(
92
+ **inputs,
93
+ max_new_tokens=100,
94
+ temperature=0.6,
95
+ do_sample=True,
96
+ pad_token_id=tokenizer.eos_token_id
97
+ )
98
+
99
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
100
+
101
+ # Post-process response to extract answer if needed (logic depends on template)
102
+ print(response)
103
+ ```
104
+
105
+ ## Training Details
106
+
107
+ * **Training Script:** `src/train.py` (from accompanying repository)
108
+ * **Dataset:** `data/final_qa_unique.jsonl` (~2845 examples)
109
+ * **Epochs:** 3
110
+ * **Learning Rate:** 2e-4
111
+ * **Batch Size (effective):** 8 (per_device_train_batch_size=4, gradient_accumulation_steps=2)
112
+ * **Optimizer:** paged_adamw_32bit
113
+ * **Precision:** fp16 mixed precision
114
+ * **QLoRA Config:** 4-bit nf4 quantization, compute_dtype=float16, double_quant=False.
115
+ * **LoRA Config:** r=64, alpha=16, dropout=0.1, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
116
+
117
+ ## Evaluation Results
118
+
119
+ Qualitative testing was performed by comparing the adapter's responses to the base model's responses on questions from the training domains.
120
+
121
+ **Findings:**
122
+ * The adapter generally produced more focused, concise, and relevant answers for the target topics (e.g., mirror signaling, first aid steps) compared to the base model.
123
+ * The base model was more prone to hallucination or providing nonsensical/repetitive answers on these specific topics.
124
+ * For some factual recall questions (e.g., precise definition of the Rule of 3s), neither model performed perfectly, highlighting the dependence on the quality and coverage of the synthetic training data.
125
+ * In some cases (e.g., frequency of checking engine oil), the base model provided a more direct answer, while the adapter answered a related concept (replacement frequency).
126
+
127
+ Overall, the adapter shows clear specialization towards the trained QA domains, but its accuracy is tied to the underlying training data.
128
+
129
+ ## Disclaimer
130
+
131
+ **This LoRA adapter is provided strictly for educational and research demonstration purposes.** It was trained on synthetically generated data and has not undergone rigorous safety testing or evaluation for production use. **The creator, Cahlen Humphreys, assumes no responsibility or liability for any consequences, damages, or issues arising from the use, interpretation, or application of this model adapter.** This includes, but is not limited to, use in production systems, decision-making processes, safety-critical applications, or any situation where incorrect information could cause harm. **Use this adapter entirely at your own risk** and be aware of potential inaccuracies or biases inherited from the base model and the unverified synthetic training data.