Spaces:
Build error
Build error
title: Phi-4 Unsloth Training | |
emoji: 🧠 | |
colorFrom: blue | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 5.17.0 | |
app_file: app.py | |
pinned: false | |
license: mit | |
# Phi-4 Unsloth Optimized Training | |
This space is dedicated to training Microsoft's Phi-4 model using Unsloth optimizations for enhanced performance and efficiency. The training process utilizes 4-bit quantization and advanced memory optimizations. | |
## Installation | |
This Hugging Face Space automatically installs dependencies from requirements.txt. The following packages are included: | |
### Installation Process | |
For clearer dependency management, the installation is split into multiple files: | |
1. **Base Dependencies (requirements-base.txt)**: | |
- Core packages like torch, transformers, accelerate, etc. | |
- Install with: `pip install -r requirements-base.txt` | |
2. **Standard Dependencies (requirements.txt)**: | |
- References base requirements and adds additional packages | |
- Install with: `pip install -r requirements.txt` | |
3. **Flash Attention (requirements-flash.txt)** (Optional): | |
- For faster attention computation | |
- Install with: `pip install -r requirements-flash.txt --no-build-isolation` | |
Using this staged approach helps prevent dependency conflicts and installation issues. | |
### Essential Dependencies | |
- **unsloth** (>=2024.3): Required for optimized 4-bit training | |
- **peft** (>=0.9.0): Required for parameter-efficient fine-tuning | |
- **transformers** (>=4.36.0): For model architecture and tokenization | |
- **einops**: Required by Unsloth for tensor manipulation | |
- **sentencepiece**: Required for tokenization | |
### Optional Dependencies | |
- **flash-attn**: Optional for faster attention computation (not included by default as it can cause build issues) | |
## Features | |
- 4-bit quantization using Unsloth | |
- Optimized training pipeline | |
- Cognitive dataset integration | |
- Advanced memory management | |
- Gradient checkpointing | |
- Sequential data processing | |
## Configuration Files | |
- `transformers_config.json`: Model and training parameters | |
- `hardware_config.json`: Hardware-specific optimizations | |
- `dataset_config.json`: Dataset processing settings | |
- `requirements.txt`: Required dependencies | |
## Training Process | |
The training utilizes the following optimizations: | |
- Unsloth's 4-bit quantization | |
- Custom chat templates for Phi-4 | |
- Paper-order preservation | |
- Efficient memory usage | |
- Gradient accumulation | |
## Dataset | |
Training uses the cognitive dataset with: | |
- Maintained paper order | |
- Proper metadata handling | |
- Optimized sequence length | |
- Efficient batching | |
## Hardware Requirements | |
- GPU: A10G or better | |
- VRAM: 24GB minimum | |
- RAM: 32GB recommended | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
# Phase 1: Domain Adaptation (Unsupervised) | |
This directory contains the code and configuration for domain adaptation of the phi-4-unsloth-bnb-4bit model to the cognitive science domain. This phase produces our domain-adapted model: [George-API/phi-4-research-assistant](https://huggingface.co/George-API/phi-4-research-assistant). | |
## Overview | |
Domain adaptation is the first phase of our training process, where we expose the model to a large corpus of cognitive science texts to help it learn domain-specific vocabulary, concepts, and patterns. This phase prepares the model for the more focused supervised fine-tuning in Phase 2. | |
## Files | |
### Core Training Files | |
- `run_transformers_training.py`: Main script for domain adaptation | |
- `transformers_config.json`: Model and training parameters | |
- `hardware_config.json`: Hardware-specific optimizations | |
- `dataset_config.json`: Dataset loading and processing settings | |
- `requirements.txt`: Required Python packages | |
### Analysis & Utilities | |
- `check_tokenization.py`: Script to analyze token distributions | |
- `update_space.py`: Hugging Face Space update utility | |
- `.env`: Environment variables (API tokens, etc.) | |
## Setup | |
1. **Environment Setup**: | |
```bash | |
python -m venv venv | |
source venv/bin/activate # or `venv\Scripts\activate` on Windows | |
pip install -r requirements.txt | |
``` | |
2. **Environment Variables**: | |
Create `.env` file with: | |
``` | |
HUGGINGFACE_TOKEN=your_token_here | |
``` | |
3. **Verify Setup**: | |
```bash | |
python check_tokenization.py # Ensures tokenizer works | |
``` | |
## How It Works | |
1. **Data Loading**: Loads pre-tokenized data from the Hugging Face dataset | |
2. **Sequential Processing**: Processes data in order, maintaining the integrity of research papers | |
3. **Efficient Training**: Uses pre-quantized Unsloth 4-bit model for memory-efficient and faster training | |
4. **Checkpointing**: Saves regular checkpoints and pushes to Hub | |
5. **Monitoring**: Logs detailed metrics and statistics during training | |
6. **Model Publishing**: Pushes the trained model to Hugging Face Hub | |
## Key Features | |
### Memory-Efficient Training | |
The training setup is optimized for A10G GPUs: | |
- Uses pre-quantized 4-bit model (no additional quantization needed) | |
- Gradient checkpointing for memory efficiency | |
- Flash attention for faster training | |
- bfloat16 mixed precision training | |
- Optimized batch sizes for maximum throughput | |
### Sequential Processing | |
The training script ensures that chunks from the same research paper are processed together by: | |
- Sorting the dataset by ID | |
- Using a SequentialSampler to maintain order | |
- Processing chunks sequentially (average 1,673 tokens per chunk) | |
### Data Collator | |
The `SimpleDataCollator` class: | |
- Preserves pre-tokenized data format | |
- Processes each entry independently | |
- Provides detailed logging of processing statistics | |
- Handles errors gracefully | |
### Checkpointing | |
The training process saves checkpoints: | |
- Every 200 steps | |
- Pushes to Hub on every save | |
- Maintains up to 5 recent checkpoints | |
- Automatically resumes from the latest checkpoint if interrupted | |
## Hardware Requirements | |
This training setup is optimized for: | |
- 2x NVIDIA A10G GPUs (24GB VRAM each) | |
- 92GB System RAM | |
- CUDA 11.8 or higher | |
Memory breakdown per GPU: | |
- Model (4-bit): ~3.5GB | |
- Optimizer states: ~1GB | |
- Batch memory: ~2GB | |
- Peak usage: 18-20GB | |
- Safe headroom: 4-6GB | |
## Configuration | |
Key parameters in `transformers_config.json`: | |
- `model_name`: unsloth/phi-4-unsloth-bnb-4bit | |
- `learning_rate`: 2e-5 | |
- `num_train_epochs`: 3 | |
- `per_device_train_batch_size`: 16 | |
- `gradient_accumulation_steps`: 4 | |
- `effective_batch_size`: 128 (16 * 4 * 2 GPUs) | |
- `max_seq_length`: 2048 | |
- `lr_scheduler_type`: "cosine" | |
- `warmup_ratio`: 0.03 | |
- `neftune_noise_alpha`: 5 | |
The configuration is optimized for: | |
- Maximum memory efficiency with pre-quantized model | |
- Stable training with cosine learning rate schedule | |
- Effective gradient updates with accumulation | |
- Regular checkpointing and Hub updates | |
## Running Domain Adaptation | |
To start domain adaptation: | |
```bash | |
python run_transformers_training.py | |
``` | |
The script will: | |
1. Load the pre-quantized model and dataset | |
2. Apply optimized training parameters | |
3. Process the data sequentially | |
4. Train the model for 3 epochs | |
5. Save and push checkpoints to Hub regularly | |
## Using the Model | |
After training, you can use the domain-adapted model: | |
```python | |
from transformers import AutoModelForCausalLM, AutoTokenizer | |
# Load the domain-adapted model | |
model_name = "George-API/phi-4-research-assistant" | |
tokenizer = AutoTokenizer.from_pretrained(model_name) | |
model = AutoModelForCausalLM.from_pretrained(model_name, | |
device_map="auto", | |
torch_dtype="bfloat16") | |
# Generate text | |
input_text = "The hippocampus is involved in" | |
inputs = tokenizer(input_text, return_tensors="pt") | |
outputs = model.generate(**inputs, max_length=100) | |
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
``` | |
## Chat Format Example | |
Phi-4 works best with its native chat template: | |
```python | |
from transformers import pipeline | |
pipeline = pipeline( | |
"text-generation", | |
model="George-API/phi-4-research-assistant", | |
model_kwargs={"torch_dtype": "bfloat16"}, | |
device_map="auto", | |
) | |
messages = [ | |
{"role": "system", "content": "You are an expert in cognitive science."}, | |
{"role": "user", "content": "Explain the role of the hippocampus in memory formation."}, | |
] | |
outputs = pipeline(messages, max_new_tokens=256) | |
print(outputs[0]["generated_text"]) | |
``` | |
## Expected Outcomes | |
After domain adaptation, the model should: | |
- Have a better understanding of cognitive science terminology | |
- Show improved performance on domain-specific tasks | |
- Be ready for supervised fine-tuning in Phase 2 | |
## Next Steps | |
After completing domain adaptation: | |
1. Evaluate the model's performance on cognitive science texts | |
2. Proceed to Phase 2 (Supervised Fine-Tuning) | |
3. Use TensorBoard to analyze training metrics |