Spaces:
Sleeping
Sleeping
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -1,28 +1,80 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
|
5 |
-
|
|
|
|
|
|
|
6 |
|
7 |
-
|
8 |
-
- **Fine-tuned on**: `phi4-cognitive-dataset`
|
9 |
-
- **Quantization**: Already 4-bit quantized (no additional quantization applied)
|
10 |
|
11 |
-
|
|
|
|
|
12 |
|
13 |
-
|
14 |
|
15 |
-
|
|
|
|
|
16 |
|
17 |
-
|
|
|
|
|
|
|
18 |
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
-
|
22 |
-
-
|
23 |
-
-
|
24 |
-
- Mixed precision training for optimal performance
|
25 |
|
26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
-
|
|
|
1 |
+
---
|
2 |
+
title: Fine-tuning DeepSeek-R1-Distill-Qwen-14B (Research Training)
|
3 |
+
emoji: 🧪
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: indigo
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 4.13.0
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
license: mit
|
11 |
+
---
|
12 |
|
13 |
+
# Model Fine-Tuning Project
|
14 |
+
|
15 |
+
## Overview
|
16 |
+
|
17 |
+
- **Goal**: Fine-tune unsloth/DeepSeek-R1-Distill-Qwen-14B-bnb-4bit using pre-tokenized JSONL dataset
|
18 |
+
- **Model**: `unsloth/DeepSeek-R1-Distill-Qwen-14B-bnb-4bit`
|
19 |
+
- **Important**: Already 4-bit quantized - do not quantize further
|
20 |
+
- **Dataset**: `phi4-cognitive-dataset`
|
21 |
+
|
22 |
+
⚠️ **RESEARCH TRAINING PHASE ONLY**: This space is being used for training purposes and does not provide interactive model outputs.
|
23 |
+
|
24 |
+
### Dataset Specs
|
25 |
+
- Entries under 2048 tokens
|
26 |
+
- Fields: `prompt_number`, `article_id`, `conversations`
|
27 |
+
- Process in ascending `prompt_number` order
|
28 |
+
- Pre-tokenized dataset - no additional tokenization needed
|
29 |
|
30 |
+
### Hardware
|
31 |
+
- GPU: 1x L40S (48GB VRAM)
|
32 |
+
- RAM: 62GB
|
33 |
+
- CPU: 8 cores
|
34 |
|
35 |
+
## Environment Variables (.env)
|
|
|
|
|
36 |
|
37 |
+
- `HF_TOKEN`: Hugging Face API token
|
38 |
+
- `HF_USERNAME`: Hugging Face username
|
39 |
+
- `HF_SPACE_NAME`: Target space name
|
40 |
|
41 |
+
## Files
|
42 |
|
43 |
+
### 1. `app.py`
|
44 |
+
- Training status dashboard
|
45 |
+
- No interactive model demo (research phase only)
|
46 |
|
47 |
+
### 2. `transformers_config.json`
|
48 |
+
- Configuration for Hugging Face Transformers
|
49 |
+
- Contains: model parameters, hardware settings, optimizer details
|
50 |
+
- Specifies pre-tokenized dataset handling
|
51 |
|
52 |
+
### 3. `run_cloud_training.py`
|
53 |
+
- Loads pre-tokenized dataset, sorts by `prompt_number`, initiates training
|
54 |
+
1. Load and sort JSONL by `prompt_number`
|
55 |
+
2. Use pre-tokenized input_ids directly (no tokenization)
|
56 |
+
3. Initialize with parameters from config
|
57 |
+
4. Execute training with metrics, checkpoints, error handling
|
58 |
+
- Uses Hugging Face's Trainer API with custom pre-tokenized data collator
|
59 |
|
60 |
+
### 4. `requirements.txt`
|
61 |
+
- Python dependencies: `transformers`, `datasets`, `torch`, etc.
|
62 |
+
- Contains unsloth for optimized training
|
|
|
63 |
|
64 |
+
### 5. `upload_to_space.py`
|
65 |
+
- Update model and space directly using HF API
|
66 |
+
|
67 |
+
## Implementation Notes
|
68 |
+
|
69 |
+
### Best Practices
|
70 |
+
- Dataset is pre-tokenized and sorted by `prompt_number`
|
71 |
+
- Settings stored in config file, avoiding hardcoding
|
72 |
+
- Hardware-optimized training parameters
|
73 |
+
- Gradient checkpointing and mixed precision training
|
74 |
+
- Complete logging for monitoring progress
|
75 |
+
|
76 |
+
### Model Repository
|
77 |
+
|
78 |
+
This space hosts a fine-tuned version of the [unsloth/DeepSeek-R1-Distill-Qwen-14B-bnb-4bit](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-14B-bnb-4bit) model.
|
79 |
|
80 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|