AST-AMVD-SAD-v1

Description

A fine-tuned audio classification model for detecting AI-generated audio content.

Author

Model Details

Model Description

  • Architecture: Based on the Audio Spectrogram Transformer (AST) architecture from MIT/ast-finetuned-audioset-10-10-0.4593
  • Input: Audio waveforms converted to mel-spectrogram representations
  • Output: Four-class classification for audio authenticity detection

Intended Use

This model is designed to:

  • Detect AI-generated audio content
  • Identify different types of synthetic audio:
    • Class 0 (H): Real Human Audio
    • Class 1 (C): AI Cloned Audio
    • Class 2 (A): AI Generated Audio
    • Class 3 (Combined): Mixed Human/AI Audio
  • Primary use cases include:
    • Content authenticity verification
    • AI-generated content detection systems
    • Audio forensics applications

Training Data

  • Dataset: AMVD_AS Dataset
  • Data Composition:
    • Balanced samples across four categories
    • Contains both synthetic and genuine human audio samples

Training Procedure

Fine-tuning Parameters

  • Base Model: MIT/ast-finetuned-audioset-10-10-0.4593
  • Initial Learning Rate: 4e-5 โ†’ 1e-5 (linear decay)
  • Total Training Steps: 25,000
  • Batch Size: 32
  • Warmup Steps: 5,000
  • Weight Decay: 0.01
  • Gradient Clip Norm: 1.0
  • Training Duration: ~4.5 hours (A100 GPU)

Evaluation

Validation Performance

  • Training Loss 0.0874
  • Gradient Norm 0.000075778
  • LR Stability 1e-5
Downloads last month
2
Safetensors
Model size
86.2M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AnodHuang/AST-AMVD-SAD-v1

Finetuned
(112)
this model

Dataset used to train AnodHuang/AST-AMVD-SAD-v1