Mindmodel-Phi4-Unsupervised / DEPLOY_CHECKLIST.md
George-API's picture
Upload folder using huggingface_hub
22cec44 verified

A newer version of the Gradio SDK is available: 5.28.0

Upgrade

Phi-4 Training Critical Deployment Checklist

Essential Configuration Requirements

1. Model Configuration

  • Model name: unsloth/phi-4-unsloth-bnb-4bit
  • BF16 precision enabled, FP16 disabled
  • Appropriate sequence length (2048)
  • LoRA parameters correctly configured (r: 32, alpha: 16)

2. Hardware & Resource Management

  • Per-device batch size ≤ 16
  • Gradient accumulation steps ≥ 3
  • Gradient checkpointing enabled
  • Memory usage limits properly set (85% of GPU capacity)

3. Critical Dataset Handling Rules

  • NO REORDERING of dataset entries - original order must be preserved
  • NO COMBINING of separate entries - each entry must remain distinct
  • SEQUENTIAL PROCESSING required - entries must be processed one after another
  • sort_by_id and maintain_paper_order flags properly set to preserve data sequence
  • Sequential sampler used with no shuffling ("shuffle": false)
  • Dataset sequential integrity verified with validation samples
  • Conversation structure preserved (original format maintained)

4. Essential Error Handling

  • Clear error catching for dataset loading issues
  • Memory tracking at key training points
  • Low-verbosity logging for HF Space compatibility

5. Training Core Requirements

  • Appropriate learning rate (2e-5)
  • Proper checkpointing frequency
  • Hub settings correctly configured for model saving

Pre-Deployment Verification

Requirement Status Notes
Data sequential integrity Confirm entries processed in order
GPU memory within limits Check peak memory doesn't exceed 20GB per GPU
Training batch verification Verify first few batches maintain proper order

Current Hardware: 4× NVIDIA L4 GPUs (24GB VRAM each)

CRITICAL REMINDER: Data sequence preservation is the highest priority - any shuffling, reordering, or combining of entries will compromise model quality.

Last Updated: 2025-03-09