Spaces:
Build error
Build error
A newer version of the Gradio SDK is available:
5.28.0
Phi-4 Training Critical Deployment Checklist
Essential Configuration Requirements
1. Model Configuration
- Model name:
unsloth/phi-4-unsloth-bnb-4bit
- BF16 precision enabled, FP16 disabled
- Appropriate sequence length (2048)
- LoRA parameters correctly configured (r: 32, alpha: 16)
2. Hardware & Resource Management
- Per-device batch size ≤ 16
- Gradient accumulation steps ≥ 3
- Gradient checkpointing enabled
- Memory usage limits properly set (85% of GPU capacity)
3. Critical Dataset Handling Rules
- NO REORDERING of dataset entries - original order must be preserved
- NO COMBINING of separate entries - each entry must remain distinct
- SEQUENTIAL PROCESSING required - entries must be processed one after another
-
sort_by_id
andmaintain_paper_order
flags properly set to preserve data sequence - Sequential sampler used with no shuffling (
"shuffle": false
) - Dataset sequential integrity verified with validation samples
- Conversation structure preserved (original format maintained)
4. Essential Error Handling
- Clear error catching for dataset loading issues
- Memory tracking at key training points
- Low-verbosity logging for HF Space compatibility
5. Training Core Requirements
- Appropriate learning rate (2e-5)
- Proper checkpointing frequency
- Hub settings correctly configured for model saving
Pre-Deployment Verification
Requirement | Status | Notes |
---|---|---|
Data sequential integrity | Confirm entries processed in order | |
GPU memory within limits | Check peak memory doesn't exceed 20GB per GPU | |
Training batch verification | Verify first few batches maintain proper order |
Current Hardware: 4× NVIDIA L4 GPUs (24GB VRAM each)
CRITICAL REMINDER: Data sequence preservation is the highest priority - any shuffling, reordering, or combining of entries will compromise model quality.
Last Updated: 2025-03-09