Spaces:

GTimothee
/

hf_speech2text_tool

Sleeping

File size: 1,160 Bytes

8345e04
c50abcc
 
8345e04
 
f4c6a43
8345e04
 
 
fb61c25
 
f4c6a43
8345e04
 
387fbd0
 
 
87411e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c50abcc
87411e9
 
 
 
 
 
c50abcc
87411e9
 
 
 
 
 
 
c50abcc

---
title: Hf Speech2text Tool
emoji: 💻
colorFrom: indigo
colorTo: red
sdk: gradio
sdk_version: 5.13.2
app_file: app.py
pinned: false
tags:
- tool
short_description: Reads an audio file and returns its transcript.
---

# Speech2text tool for your agent

Uses the huggingface API under the hood.

A simple tool for prototyping agents that can extract text from audio. 

This tool
1. opens and reads an audio file
2. calls huggingface api with your hf token to get a transcript
3. returns the string

Useful for implementing vocal commands.

# Usage

```python
from smolagents import Tool
from smolagents import CodeAgent
from smolagents import HfApiModel

hf_speech2text_tool = Tool.from_hub(
  "GTimothee/hf_text2speech_tool",
  token=<yourtokenhere>,
  trust_remote_code=True
)

model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct", token=<yourtokenhere>)
agent = CodeAgent(tools=[hf_speech2text_tool], model=model)
output = agent.run(
  "Use your tools to read the audio file and return the transcription.",
  additional_args={
      'audio_filepath': filepath,
      'hf_token': <yourtokenhere>,
      'model_for_transcription': 'whisper-small.en'}
)
```