Spaces:

DeepLearning101
/

Speech-Quality-Inspection_Meta-Denoiser

Running

App Files Files Community

DeepLearning101 commited on 20 days ago

Commit

06a3e4d

verified ·

1 Parent(s): 6c96ec5

Update app.py

Browse files

Files changed (1) hide show

app.py +35 -88

app.py CHANGED Viewed

@@ -79,6 +79,39 @@ transcribe.__annotations__ = {
     "return": str
 }
 demo = gr.Interface(
     fn=transcribe,
     inputs=[
@@ -90,93 +123,7 @@ demo = gr.Interface(
     live=True,
     allow_flagging="never",
     title="<h1>語音質檢/噪音去除 (語音增強)</h1>",
-    description="""<h2><a href='https://www.twman.org' target='_blank'>TonTon Huang Ph.D.</a> | <a href='https://blog.twman.org/p/deeplearning101.html' target='_blank'>手把手帶你一起踩AI坑</a><br></h2><br>
-                為了提升語音識別的效果，可以在識別前先進行噪音去除<br>
-                    <a href='https://github.com/Deep-Learning-101' target='_blank'>Deep Learning 101 Github</a> | <a href='http://deeplearning101.twman.org' target='_blank'>Deep Learning 101</a> | <a href='https://www.facebook.com/groups/525579498272187/' target='_blank'>台灣人工智慧社團 FB</a> | <a href='https://www.youtube.com/c/DeepLearning101' target='_blank'>YouTube</a><br>
-                    <a href='https://blog.twman.org/2025/03/AIAgent.html' target='_blank'>那些 AI Agent 要踩的坑</a>：探討多種 AI 代理人工具的應用經驗與挑戰，分享實用經驗與工具推薦。<br>
-                    <a href='https://blog.twman.org/2024/08/LLM.html' target='_blank'>白話文手把手帶你科普 GenAI</a>：淺顯介紹生成式人工智慧核心概念，強調硬體資源和數據的重要性。<br>
-                    <a href='https://blog.twman.org/2024/09/LLM.html' target='_blank'>大型語言模型直接就打完收工？</a>：回顧 LLM 領域探索歷程，討論硬體升級對 AI 開發的重要性。<br>
-                    <a href='https://blog.twman.org/2024/07/RAG.html' target='_blank'>那些檢索增強生成要踩的坑</a>：探討 RAG 技術應用與挑戰，��供實用經驗分享和工具建議。<br>
-                    <a href='https://blog.twman.org/2024/02/LLM.html' target='_blank'>那些大型語言模型要踩的坑</a>：探討多種 LLM 工具的應用與挑戰，強調硬體資源的重要性。<br>
-                    <a href='https://blog.twman.org/2023/04/GPT.html' target='_blank'>Large Language Model，LLM</a>：探討 LLM 的發展與應用，強調硬體資源在開發中的關鍵作用。。<br>
-                    <a href='https://blog.twman.org/2024/11/diffusion.html' target='_blank'>ComfyUI + Stable Diffuision</a>：深入探討影像生成與分割技術的應用，強調硬體資源的重要性。<br>
-                    <a href='https://blog.twman.org/2024/02/asr-tts.html' target='_blank'>那些ASR和TTS可能會踩的坑</a>：探討 ASR 和 TTS 技術應用中的問題，強調數據質量的重要性。<br>
-                    <a href='https://blog.twman.org/2021/04/NLP.html' target='_blank'>那些自然語言處理 (Natural Language Processing, NLP) 踩的坑</a>：分享 NLP 領域的實踐經驗，強調數據質量對模型效果的影響。<br>
-                    <a href='https://blog.twman.org/2021/04/ASR.html' target='_blank'>那些語音處理 (Speech Processing) 踩的坑</a>：分享語音處理領域的實務經驗，強調資料品質對模型效果的影響。<br>
-                    <a href='https://blog.twman.org/2023/07/wsl.html' target='_blank'>用PPOCRLabel來幫PaddleOCR做OCR的微調和標註</a><br>
-                    <a href='https://blog.twman.org/2023/07/HugIE.html' target='_blank'>基於機器閱讀理解和指令微調的統一信息抽取框架之診斷書醫囑資訊擷取分析</a><br>
-                <a href='https://github.com/facebookresearch/denoiser' target='_blank'> Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)</a>""",
 )
-demo.launch(debug=True, share=True)
-# import os
-# import time
-# import json
-# import gradio as gr
-# import torch
-# import torchaudio
-# import numpy as np
-# from denoiser.demucs import Demucs
-# from pydub import AudioSegment
-# import soundfile as sf
-# import librosa
-# modelpath = './denoiser/master64.th'
-# def transcribe(file_upload, microphone):
-#     file = microphone if microphone is not None else file_upload
-#     # 新增音訊預處理 → 統一格式
-#     def preprocess_audio(path):
-#         data, sr = sf.read(path)
-#         # 如果是雙聲道 → 轉單聲道
-#         if len(data.shape) > 1:
-#             data = data.mean(axis=1)
-#         # 如果不是 16kHz → 重採樣
-#         if sr != 16000:
-#             data = librosa.resample(data, orig_sr=sr, target_sr=16000)
-#             sr = 16000
-#         # 儲存為 WAV 供模型使用
-#         sf.write("enhanced.wav", data, sr)
-#         return "enhanced.wav"
-#     # 如果是 MP3，先轉成 WAV 再處理
-#     if file.lower().endswith(".mp3"):
-#         audio = AudioSegment.from_file(file)
-#         audio = audio.set_frame_rate(16000).set_channels(1)  # 轉單聲道 + 16kHz
-#         audio.export("enhanced.wav", format="wav")
-#         file = "enhanced.wav"
-#     else:
-#         file = preprocess_audio(file)
-#     model = Demucs(hidden=64)
-#     state_dict = torch.load(modelpath, map_location='cpu')
-#     model.load_state_dict(state_dict)
-#     demucs = model.eval()
-#     x, sr = torchaudio.load(file)
-#     x = x[0:1]  # 強制取第一個聲道（確保是單聲道）
-#     with torch.no_grad():
-#         out = demucs(x[None])[0]
-#     out = out / max(out.abs().max().item(), 1)
-#     torchaudio.save('enhanced_final.wav', out, sr)
-#     # 輸出 WAV 格式給前端播放
-#     enhanced = AudioSegment.from_wav('enhanced_final.wav')
-#     enhanced.export('enhanced_final.mp3', format="mp3", bitrate="256k")
-#     return "enhanced_final.mp3"  # 回傳 MP3 更省空間
-# # 👇 加上這一行，解決 Gradio schema 推導錯誤
-# transcribe.__annotations__ = {
-#     "file_upload": str,
-#     "microphone": str,
-#     "return": str
-# }

     "return": str
 }
+# 🎯 你提供的 description 內容（已轉為 HTML）
+description_html = """
+<h1 align='center'><a href='https://www.twman.org/AI/ASR/SpeechEnhancement' target='_blank'>中文語音增強(去噪)</a></h1>
+<p align='center'><b>上傳一段音檔 （支援 `.mp3`, `.wav`），為了提升語音識別的效果，可以在識別前先進行噪音去除</b></p>
+<div align='center'>
+  <a href='https://www.twman.org' target='_blank'>TonTon Huang Ph.D.</a> |
+  <a href='https://www.twman.org/AI' target='_blank'> AI </a> |
+  <a href='https://blog.twman.org/p/deeplearning101.html' target='_blank'>手把手帶你一起踩AI坑</a> |
+  <a href='https://github.com/Deep-Learning-101' target='_blank'>GitHub</a> |
+  <a href='http://deeplearning101.twman.org' target='_blank'>Deep Learning 101</a> |
+  <a href='https://www.youtube.com/c/DeepLearning101' target='_blank'>YouTube</a>
+</div>
+<br>
+### 📘 相關技術文章：
+<ul>
+  <li><a href='https://blog.twman.org/2025/03/AIAgent.html' target='_blank'>避開 AI Agent 開發陷阱：常見問題、挑戰與解決方案 (那些 AI Agent 實戰踩過的坑)</a>：探討多種 AI Agent 工具的應用經驗與挑戰</li>
+  <li><a href='https://blog.twman.org/2024/08/LLM.html' target='_blank'>白話文手把手帶你科普 GenAI</a>：淺顯介紹生成式人工智慧核心概念</li>
+  <li><a href='https://blog.twman.org/2024/09/LLM.html' target='_blank'>大型語言模型直接就打完收工？</a>：回顧 LLM 領域探索歷程</li>
+  <li><a href='https://blog.twman.org/2024/07/RAG.html' target='_blank'>檢索增強生成 (Retrieval-Augmented Generation, RAG) 不是萬靈丹之優化挑戰技巧</a>：探討 RAG 技術應用與挑戰</li>
+  <li><a href='https://blog.twman.org/2024/02/LLM.html' target='_blank'>大型語言模型 (LLM) 入門完整指南：原理、應用與未來</a>：探討多種 LLM 工具的應用與挑戰</li>
+  <li><a href='https://blog.twman.org/2023/04/GPT.html' target='_blank'>什麼是大語言模型，它是什麼？想要嗎？(Large Language Model，LLM)</a>：探討 LLM 的發展與應用</li>
+  <li><a href='https://blog.twman.org/2024/11/diffusion.html' target='_blank'>ComfyUI + Stable Diffuision</a>：深入探討影像生成與分割技術的應用</li>
+  <li><a href='https://blog.twman.org/2024/02/asr-tts.html' target='_blank'>ASR/TTS 開發避坑指南：語音辨識與合成的常見挑戰與對策</a>：探討 ASR 和 TTS 技術應用中的問題</li>
+  <li><a href='https://blog.twman.org/2021/04/NLP.html' target='_blank'>那些自然語言處理 (NLP) 踩的坑</a>：分享 NLP 領域的實踐經驗</li>
+  <li><a href='https://blog.twman.org/2021/04/ASR.html' target='_blank'>那些語音處理 (Speech Processing) 踩的坑</a>：分享語音處理領域的實務經驗</li>
+  <li><a href='https://blog.twman.org/2023/07/wsl.html' target='_blank'>用PPOCRLabel來幫PaddleOCR做OCR的微調和標註</a></li>
+  <li><a href='https://blog.twman.org/2023/07/HugIE.html' target='_blank'>基於機器閱讀理解和指令微調的統一信息抽取框架之診斷書醫囑資訊擷取分析</a></li>
+  <li><a href='https://github.com/shibing624/pycorrector' target='_blank'>Masked Language Model (MLM) as correction BERT</a></li>
+</ul>
+<br>
+"""
 demo = gr.Interface(
     fn=transcribe,
     inputs=[
     live=True,
     allow_flagging="never",
     title="<h1>語音質檢/噪音去除 (語音增強)</h1>",
+    description=description_html
 )
+demo.launch(debug=True, share=True)