Spaces:
Running
Running
# Resources | |
--- | |
## Github | |
- [NeMo](https://github.com/NVIDIA/NeMo) | |
- [Llama](https://github.com/facebookresearch/llama) | |
- [Demucs](https://github.com/facebookresearch/demucs) | |
- [Whisper](https://github.com/openai/whisper) | |
- [Whisper NeMo Diarization](https://github.com/MahmoudAshraf97/whisper-diarization) | |
- [Text to speech alignment using CTC forced alignment](https://github.com/MahmoudAshraf97/ctc-forced-aligner) | |
- [Utilities intended for use with Llama models.](https://github.com/meta-llama/llama-models/) | |
- [Llama Recipes: Examples to get started using the Llama models from Meta](https://github.com/meta-llama/llama-recipes) | |
- [timsainb/noisereduce: Noise reduction in python using spectral gating](https://github.com/timsainb/noisereduce/) | |
- [pyannote/pyannote-audio: Neural building blocks for speaker diarization](https://github.com/pyannote/pyannote-audio) | |
- [microsoft/DNS-Challenge: This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.](https://github.com/microsoft/DNS-Challenge) | |
- [WenzheLiu-Speech/awesome-speech-enhancement: speech enhancement\speech seperation\sound source localization](https://github.com/WenzheLiu-Speech/awesome-speech-enhancement) | |
- [nanahou/Awesome-Speech-Enhancement: A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.](https://github.com/nanahou/Awesome-Speech-Enhancement) | |
- [jonashaag/speech-enhancement: Collection of papers, datasets and tools on the topic of Speech Dereverberation and Speech Enhancement](https://github.com/jonashaag/speech-enhancement) | |
- [yxlu-0102/MP-SENet: Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement](https://github.com/yxlu-0102/MP-SENet) | |
- [Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement](https://yxlu-0102.github.io/MP-SENet/) | |
- [## SUPERSEDED: THIS DATASET HAS BEEN REPLACED. ## Noisy speech database for training speech enhancement algorithms and TTS models](https://datashare.ed.ac.uk/handle/10283/1942) | |
--- | |
## Web | |
- [Llama](https://www.llama.com/) | |
- [Download Llama](https://www.llama.com/llama-downloads/) | |
- [Llama 3.2 Requirements](https://llamaimodel.com/requirements-3-2/) | |
- [Average handle time (AHT): Formula and tips for improvement](https://www.zendesk.com/blog/average-handle-time/) | |
--- | |
## Notebooks | |
- [Hybrid Demucs Music Source Separation](https://colab.research.google.com/drive/1dC9nVxk3V_VPjUADsnFu8EiT-xnU1tGH) | |
--- | |
## PyPI | |
- [demucs](https://pypi.org/project/demucs/) | |
- [MPSENet](https://pypi.org/project/MPSENet/) | |
--- | |
## Errors | |
- [`The file is already fully retrieved; nothing to do.`](https://github.com/facebookresearch/llama/issues/760) | |
--- | |
## Paper | |
- [Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation](https://arxiv.org/abs/2007.13975) | |
- [MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra](https://arxiv.org/abs/2305.13686) | |
- [FINALLY: fast and universal speech enhancement with studio-like quality](https://arxiv.org/abs/2410.05920) | |
- [Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement](https://arxiv.org/abs/2308.08926) | |
- [\[2410.08235\] A Recurrent Neural Network Approach to the Answering Machine Detection Problem](https://arxiv.org/abs/2410.08235) | |
--- | |
## Youtube | |
- [A Course on Speech Enhancement](https://www.youtube.com/playlist?list=PLO9nFIQB53_DU8o0fToNdNFdZuDxD9fAN) | |
- [COMS 4995 Final on Speech Enhancement](https://www.youtube.com/watch?v=uRwlSh1FMzc&t=74s) | |
- [Achieving Studio-Quality Speech with Generative AI](https://www.youtube.com/watch?v=UxbEjpLMU8s) | |
- [How to Fix Bad Podcast Audio](https://www.youtube.com/watch?v=0mPkPQNHsZc) | |
- [Speech Enhancement for Cochlear Implant Recipients Using Deep Complex Convolution Transformer With F](https://www.youtube.com/watch?v=i1qTgjMtS2Y) | |
- [Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors](https://www.youtube.com/watch?v=4jiQdotz6qY) | |
- [2024 종합설계 3팀 2차, Neural Network for Speech Enhancement](https://www.youtube.com/watch?v=yOfTYuc9FEQ) | |
- [MIAI Deeptails Seminar : Generative Models as Data-driven Priors for Speech Enhancement](https://www.youtube.com/watch?v=XSLgUsgyzUA) | |
- [Hardware Efficient Speech Enhancement With Noise Aware Multi Target Deep Learning](https://www.youtube.com/watch?v=qO6JqDUQlsI) | |
- [Diffusion Models for Speech Enhancement | Julius Richter](https://www.youtube.com/watch?v=HMrs6YWDl5M) | |
- [Speech Enhancement: Basics & Key Details](https://www.youtube.com/watch?v=5kItH2pq_3E) | |
- [Guided Speech Enhancement Network (ICASSP 2023)](https://www.youtube.com/watch?v=JoDqXkAjlh4) | |
- [VSANet: Real-time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention](https://www.youtube.com/watch?v=GP39vFA2E48) | |
- [Research intern talk: Unified speech enhancement approach for speech degradation & noise suppression](https://www.youtube.com/watch?v=_ggfv6eMIJs) | |
- [Magnitude and phase spectrum with example](https://www.youtube.com/watch?v=MFOjUgafq0k) | |
- [Deep Learning In Audio for Absolute Beginners: From No Experience & No Datasets to a Deployed Model](https://www.youtube.com/watch?v=sqrah49GUkI) | |
- [Look Once to Hear: Target Speech Hearing with Noisy Examples](https://www.youtube.com/watch?v=V-XCfnjfQmM) | |
--- | |
## Wikipedia | |
- [Speech enhancement](https://en.m.wikipedia.org/wiki/Speech_enhancement) | |
--- | |
## Hugging Face | |
- [Models(asteroid)](https://huggingface.co/models?library=asteroid) | |
- [cankeles/DPTNet_WHAMR_enhsingle_16k](https://huggingface.co/cankeles/DPTNet_WHAMR_enhsingle_16k) | |
- [JacobLinCool/MP-SENet-VB](https://huggingface.co/JacobLinCool/MP-SENet-VB) | |
- [JacobLinCool/MP-SENet-DNS](https://huggingface.co/JacobLinCool/MP-SENet-DNS) | |
- [ENOT-AutoDL/MP-SENet](https://huggingface.co/ENOT-AutoDL/MP-SENet) | |
--- | |
## Web | |
- [Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation](https://paperswithcode.com/paper/dual-path-transformer-network-direct-context-1) | |
- [The Audio Developer Conference - ADC is an annual event celebrating all audio development technologies, from music applications and game audio to audio processing and embedded systems.](https://audio.dev/) | |
- [Look Once to Hear: Target Speech Hearing with Noisy Examples - CHI '24](https://programs.sigchi.org/chi/2024/program/content/147319) | |
- [Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition > Introduction | Class Central Classroom](https://www.classcentral.com/classroom/youtube-reinforcement-learning-based-speech-enhancement-for-robust-speech-recognition-131999) | |
- [Sound classification with YAMNet TensorFlow Hub](https://www.tensorflow.org/hub/tutorials/yamnet) | |
- [DEEP-VOICE: DeepFake Voice Recognition Dataset | Papers With Code](https://paperswithcode.com/dataset/deep-voice-deepfake-voice-recognition) | |
--- | |
## Dataset | |
- [VoiceBank+DEMAND](https://datashare.ed.ac.uk/handle/10283/1942) | |
- [VoiceBank+DEMAND](https://drive.google.com/drive/folders/19I_thf6F396y5gZxLTxYIojZXC0Ywm8l) | |
--- | |