A Step Towards Music Generation Foundation Model
Generate descriptions from images using masks
Generate realistic dialogue from a script, using Dia!
Try Orpheus TTS here