RobCaamano/Sherlock-Holmes

This model is a fine-tuned version of XTTS-v2 on a custom dataset.

Training and evaluation data

Audio collected from audiobook.

Training hyperparameters

The following hyperparameters were used during training:

GPTArgs():

  • max_conditioning_length=143677
  • min_conditioning_length=66150
  • max_wav_length=255995
  • max_text_length=66150
  • gpt_use_masking_gt_prompt_approach=True
  • gpt_use_perceiver_resampler=True

GPTTrainerConfig():

  • BATCH_SIZE=3
  • batch_group_size=48
  • GRAD_ACUMM_STEPS=84
  • optimizer_params={"betas": [0.9, 0.96], "eps": 1e-8, "weight_decay": 1e-2}
  • lr_scheduler_params={"milestones": [50000 * 18, 150000 * 18, 300000 * 18], "gamma": 0.5, "last_epoch": -1}

Training results

  • Train avg_loss: 0.05243
  • Train avg_loss_mel_ce: 4.38085
  • Train avg_loss_text_ce: 0.02336
  • Validation avg_loss: 4.1927
  • Validation avg_loss_mel_ce: 4.17117
  • Validation avg_loss_text_ce: 0.02153
  • Epoch: 1

Framework versions

  • Transformers 4.51.3
  • PyTorch 2.6.0+cu126
  • TorchAudio 2.6.0
  • Tokenizers 0.21.1
  • TTS
Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RobCaamano/Sherlock-Holmes

Base model

coqui/XTTS-v2
Finetuned
(37)
this model