Spaces:
Sleeping
Sleeping
title: Hf Speech2text Tool | |
emoji: 💻 | |
colorFrom: indigo | |
colorTo: red | |
sdk: gradio | |
sdk_version: 5.13.2 | |
app_file: app.py | |
pinned: false | |
tags: | |
- tool | |
short_description: Reads an audio file and returns its transcript. | |
# Speech2text tool for your agent | |
Uses the huggingface API under the hood. | |
A simple tool for prototyping agents that can extract text from audio. | |
This tool | |
1. opens and reads an audio file | |
2. calls huggingface api with your hf token to get a transcript | |
3. returns the string | |
Useful for implementing vocal commands. | |
# Usage | |
```python | |
from smolagents import Tool | |
from smolagents import CodeAgent | |
from smolagents import HfApiModel | |
hf_speech2text_tool = Tool.from_hub( | |
"GTimothee/hf_text2speech_tool", | |
token=<yourtokenhere>, | |
trust_remote_code=True | |
) | |
model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct", token=<yourtokenhere>) | |
agent = CodeAgent(tools=[hf_speech2text_tool], model=model) | |
output = agent.run( | |
"Use your tools to read the audio file and return the transcription.", | |
additional_args={ | |
'audio_filepath': filepath, | |
'hf_token': <yourtokenhere>, | |
'model_for_transcription': 'whisper-small.en'} | |
) | |
``` |