File size: 7,278 Bytes
1b97239
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# Resources

---

## Github

- [NeMo](https://github.com/NVIDIA/NeMo)
- [Llama](https://github.com/facebookresearch/llama)
- [Demucs](https://github.com/facebookresearch/demucs)
- [Whisper](https://github.com/openai/whisper)
- [Whisper NeMo Diarization](https://github.com/MahmoudAshraf97/whisper-diarization)
- [Text to speech alignment using CTC forced alignment](https://github.com/MahmoudAshraf97/ctc-forced-aligner)
- [Utilities intended for use with Llama models.](https://github.com/meta-llama/llama-models/)
- [Llama Recipes: Examples to get started using the Llama models from Meta](https://github.com/meta-llama/llama-recipes)
- [timsainb/noisereduce: Noise reduction in python using spectral gating](https://github.com/timsainb/noisereduce/)
- [pyannote/pyannote-audio: Neural building blocks for speaker diarization](https://github.com/pyannote/pyannote-audio)
- [microsoft/DNS-Challenge: This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.](https://github.com/microsoft/DNS-Challenge)
- [WenzheLiu-Speech/awesome-speech-enhancement: speech enhancement\speech seperation\sound source localization](https://github.com/WenzheLiu-Speech/awesome-speech-enhancement)
- [nanahou/Awesome-Speech-Enhancement: A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.](https://github.com/nanahou/Awesome-Speech-Enhancement)
- [jonashaag/speech-enhancement: Collection of papers, datasets and tools on the topic of Speech Dereverberation and Speech Enhancement](https://github.com/jonashaag/speech-enhancement)
- [yxlu-0102/MP-SENet: Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement](https://github.com/yxlu-0102/MP-SENet)
- [Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement](https://yxlu-0102.github.io/MP-SENet/)
- [## SUPERSEDED: THIS DATASET HAS BEEN REPLACED. ## Noisy speech database for training speech enhancement algorithms and TTS models](https://datashare.ed.ac.uk/handle/10283/1942)

---

## Web

- [Llama](https://www.llama.com/)
- [Download Llama](https://www.llama.com/llama-downloads/)
- [Llama 3.2 Requirements](https://llamaimodel.com/requirements-3-2/)
- [Average handle time (AHT): Formula and tips for improvement](https://www.zendesk.com/blog/average-handle-time/)

---

## Notebooks

- [Hybrid Demucs Music Source Separation](https://colab.research.google.com/drive/1dC9nVxk3V_VPjUADsnFu8EiT-xnU1tGH)

---

## PyPI

- [demucs](https://pypi.org/project/demucs/)
- [MPSENet](https://pypi.org/project/MPSENet/)

---

## Errors

- [`The file is already fully retrieved; nothing to do.`](https://github.com/facebookresearch/llama/issues/760)

---

## Paper

- [Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation](https://arxiv.org/abs/2007.13975)
- [MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra](https://arxiv.org/abs/2305.13686)
- [FINALLY: fast and universal speech enhancement with studio-like quality](https://arxiv.org/abs/2410.05920)
- [Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement](https://arxiv.org/abs/2308.08926)
- [\[2410.08235\] A Recurrent Neural Network Approach to the Answering Machine Detection Problem](https://arxiv.org/abs/2410.08235)

---

## Youtube

- [A Course on Speech Enhancement](https://www.youtube.com/playlist?list=PLO9nFIQB53_DU8o0fToNdNFdZuDxD9fAN)
- [COMS 4995 Final on Speech Enhancement](https://www.youtube.com/watch?v=uRwlSh1FMzc&t=74s)
- [Achieving Studio-Quality Speech with Generative AI](https://www.youtube.com/watch?v=UxbEjpLMU8s)
- [How to Fix Bad Podcast Audio](https://www.youtube.com/watch?v=0mPkPQNHsZc)
- [Speech Enhancement for Cochlear Implant Recipients Using Deep Complex Convolution Transformer With F](https://www.youtube.com/watch?v=i1qTgjMtS2Y)
- [Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors](https://www.youtube.com/watch?v=4jiQdotz6qY)
- [2024 종합설계 3팀 2차, Neural Network for Speech Enhancement](https://www.youtube.com/watch?v=yOfTYuc9FEQ)
- [MIAI Deeptails Seminar : Generative Models as Data-driven Priors for Speech Enhancement](https://www.youtube.com/watch?v=XSLgUsgyzUA)
- [Hardware Efficient Speech Enhancement With Noise Aware Multi Target Deep Learning](https://www.youtube.com/watch?v=qO6JqDUQlsI)
- [Diffusion Models for Speech Enhancement | Julius Richter](https://www.youtube.com/watch?v=HMrs6YWDl5M)
- [Speech Enhancement: Basics & Key Details](https://www.youtube.com/watch?v=5kItH2pq_3E)
- [Guided Speech Enhancement Network (ICASSP 2023)](https://www.youtube.com/watch?v=JoDqXkAjlh4)
- [VSANet: Real-time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention](https://www.youtube.com/watch?v=GP39vFA2E48)
- [Research intern talk: Unified speech enhancement approach for speech degradation & noise suppression](https://www.youtube.com/watch?v=_ggfv6eMIJs)
- [Magnitude and phase spectrum with example](https://www.youtube.com/watch?v=MFOjUgafq0k)
- [Deep Learning In Audio for Absolute Beginners: From No Experience & No Datasets to a Deployed Model](https://www.youtube.com/watch?v=sqrah49GUkI)
- [Look Once to Hear: Target Speech Hearing with Noisy Examples](https://www.youtube.com/watch?v=V-XCfnjfQmM)

---

## Wikipedia

- [Speech enhancement](https://en.m.wikipedia.org/wiki/Speech_enhancement)

---

## Hugging Face

- [Models(asteroid)](https://huggingface.co/models?library=asteroid)
- [cankeles/DPTNet_WHAMR_enhsingle_16k](https://huggingface.co/cankeles/DPTNet_WHAMR_enhsingle_16k)
- [JacobLinCool/MP-SENet-VB](https://huggingface.co/JacobLinCool/MP-SENet-VB)
- [JacobLinCool/MP-SENet-DNS](https://huggingface.co/JacobLinCool/MP-SENet-DNS)
- [ENOT-AutoDL/MP-SENet](https://huggingface.co/ENOT-AutoDL/MP-SENet)

---

## Web

- [Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation](https://paperswithcode.com/paper/dual-path-transformer-network-direct-context-1)
- [The Audio Developer Conference - ADC is an annual event celebrating all audio development technologies, from music applications and game audio to audio processing and embedded systems.](https://audio.dev/)
- [Look Once to Hear: Target Speech Hearing with Noisy Examples - CHI '24](https://programs.sigchi.org/chi/2024/program/content/147319)
- [Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition > Introduction | Class Central Classroom](https://www.classcentral.com/classroom/youtube-reinforcement-learning-based-speech-enhancement-for-robust-speech-recognition-131999)
- [Sound classification with YAMNet TensorFlow Hub](https://www.tensorflow.org/hub/tutorials/yamnet)
- [DEEP-VOICE: DeepFake Voice Recognition Dataset | Papers With Code](https://paperswithcode.com/dataset/deep-voice-deepfake-voice-recognition)

---

## Dataset

- [VoiceBank+DEMAND](https://datashare.ed.ac.uk/handle/10283/1942)
- [VoiceBank+DEMAND](https://drive.google.com/drive/folders/19I_thf6F396y5gZxLTxYIojZXC0Ywm8l)

---