AST-AMVD-SAD-v1
Description
A fine-tuned audio classification model for detecting AI-generated audio content.
Author
- Kunyang Huang ([email protected])
- Bin Hu ([email protected])
Model Details
Model Description
- Architecture: Based on the Audio Spectrogram Transformer (AST) architecture from MIT/ast-finetuned-audioset-10-10-0.4593
- Input: Audio waveforms converted to mel-spectrogram representations
- Output: Four-class classification for audio authenticity detection
Intended Use
This model is designed to:
- Detect AI-generated audio content
- Identify different types of synthetic audio:
- Class 0 (H): Real Human Audio
- Class 1 (C): AI Cloned Audio
- Class 2 (A): AI Generated Audio
- Class 3 (Combined): Mixed Human/AI Audio
- Primary use cases include:
- Content authenticity verification
- AI-generated content detection systems
- Audio forensics applications
Training Data
- Dataset: AMVD_AS Dataset
- Data Composition:
- Balanced samples across four categories
- Contains both synthetic and genuine human audio samples
Training Procedure
Fine-tuning Parameters
- Base Model: MIT/ast-finetuned-audioset-10-10-0.4593
- Initial Learning Rate: 4e-5 โ 1e-5 (linear decay)
- Total Training Steps: 25,000
- Batch Size: 32
- Warmup Steps: 5,000
- Weight Decay: 0.01
- Gradient Clip Norm: 1.0
- Training Duration: ~4.5 hours (A100 GPU)
Evaluation
Validation Performance
- Training Loss 0.0874
- Gradient Norm 0.000075778
- LR Stability 1e-5
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for AnodHuang/AST-AMVD-SAD-v1
Base model
MIT/ast-finetuned-audioset-10-10-0.4593