File size: 4,584 Bytes
83a4e82
5c39419
83a4e82
 
 
 
 
 
 
 
d5a648d
83a4e82
 
5c39419
83a4e82
 
 
5c39419
83a4e82
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5c39419
 
83a4e82
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5c39419
83a4e82
 
 
 
 
 
6b6fa2c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83a4e82
 
 
 
 
5c39419
83a4e82
 
 
 
 
 
5c39419
 
83a4e82
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
title: gradio-transcript-mcp - Gradio MCP Server for Transcription
emoji: 💬
colorFrom: green
colorTo: green
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Gradio MCP server to transcribe audio & video from URLs
---

# gradio-transcript-mcp: A Gradio MCP Server for Audio/Video Transcription from URLs

## Overview

`gradio-transcript-mcp` is a Gradio application configured to function as an MCP (Model Control Protocol) server. It is designed to transcribe audio and video from URLs into text. Implementing OpenAI's Whisper and `ffmpeg` (via `yt-dlp`), this server enables MCP clients (like Cline) to process multimedia inputs efficiently by downloading and converting content from a given URL. It supports robust handling, including format conversion to WAV and dynamic device selection (CPU or GPU).

The repository contains the following main components:
- **`app.py`**: The main Gradio application file that runs the MCP server.
- **`transcription_tool.py`**: The core logic for handling file conversion and calling the transcription function.
- **`transcription.py`**: Contains the implementation for Whisper transcription using the `transformers` library.
- **`requirements.txt`**: Lists the necessary Python dependencies.
- **`ffmpeg_setup.py`**: Script to ensure ffmpeg is available.
- **`logging_config.py`**: Configuration for logging.

---

## Installation

1. Clone this repository:
   ```bash
   git clone https://huggingface.co/spaces/bismay/gradio-transcript-mcp
   cd gradio-transcript-mcp
   ```
2. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```
   This will install the necessary libraries, including `gradio[mcp]`, `yt-dlp`, `transformers`, and `torch`.

---
## Usage

### Running the Gradio App / MCP Server

To run the Gradio application which also starts the MCP server, execute:
   ```bash
   python app.py
   ```

This will launch a local Gradio web interface and start the MCP server. The server will expose the `transcribe_url` function as an MCP tool.

### Using as an MCP Server

When you run `python app.py`, the application starts an MCP server accessible to MCP clients.

**Exposed Tool:**

The server exposes one tool: `transcribe_url`.

*   **Description:** Transcribes audio or video from a given URL. Downloads the media from the URL, converts it to WAV format, and then uses the TranscriptTool to perform the transcription in English.
*   **Input:**
    *   `url` (string): The URL of the audio or video file.
*   **Output:** (string): The transcription of the audio/video in English, or an error message if download or transcription fails.

**Connecting an MCP Client:**

The MCP server will typically be accessible at `http://127.0.0.1:7860/gradio_api/mcp/sse` when run locally. You can find the exact URL printed in your console when the Gradio app launches.

To connect an MCP client (like Cline) to this server, you need to add a configuration entry in your client's settings. The exact format depends on your client, but it generally involves specifying a name for the server and its URL.

Example configuration for a client (like Cline) that supports SSE:

```json
{
  "mcpServers": {
    "gradio-transcript": {
      "url": "http://127.0.0.1:7860/gradio_api/mcp/sse"
    }
  }
}
```

*Note: If your MCP client does not directly support SSE-based servers (like Claude Desktop), you may need to use a tool like `mcp-remote` as an intermediary.*

In those cases, you can use a tool such as mcp-remote. First install Node.js. Then, add the following to your own MCP Client config:

```json
{
  "mcpServers": {
    "gradio": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "http://127.0.0.1:7860/gradio_api/mcp/sse"
      ]
    }
  }
}
```

### Connecting to the Hosted Server on Hugging Face Spaces

This application is also hosted on Hugging Face Spaces, providing a publicly accessible MCP server. You can connect to this hosted server using the following URL:

`https://bismay-gradio-transcript-mcp.hf.space/gradio_api/mcp/sse`

To connect your MCP client (like Cline) to this hosted server, add a configuration entry similar to this:

```json
{
  "mcpServers": {
    "gradio-transcript": {
      "url": "https://bismay-gradio-transcript-mcp.hf.space/gradio_api/mcp/sse"
    }
  }
}
```

---
## License
This project is licensed under the Apache-2.0 License. See the LICENSE file for more details.

---
## Contributing
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.