pydantic==2.10.6 gradio==3.44.4 torchaudio soundfile tqdm scipy numpy einops rotary_embedding_torch torchinfo packaging typing yamlargparse librosa pesq opencv-python==4.10.0.84 python_speech_features scenedetect torchvision pydub