Spaces:

GTimothee
/

hf_speech2text_tool

Sleeping

hf_speech2text_tool / README.md

Update README.md

387fbd0 verified 3 months ago

1.16 kB

	---
	title: Hf Speech2text Tool
	emoji: 💻
	colorFrom: indigo
	colorTo: red
	sdk: gradio
	sdk_version: 5.13.2
	app_file: app.py
	pinned: false
	tags:
	- tool
	short_description: Reads an audio file and returns its transcript.
	---

	# Speech2text tool for your agent

	Uses the huggingface API under the hood.

	A simple tool for prototyping agents that can extract text from audio.

	This tool
	1. opens and reads an audio file
	2. calls huggingface api with your hf token to get a transcript
	3. returns the string

	Useful for implementing vocal commands.

	# Usage

	```python
	from smolagents import Tool
	from smolagents import CodeAgent
	from smolagents import HfApiModel

	hf_speech2text_tool = Tool.from_hub(
	"GTimothee/hf_text2speech_tool",
	token=<yourtokenhere>,
	trust_remote_code=True
	)

	model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct", token=<yourtokenhere>)
	agent = CodeAgent(tools=[hf_speech2text_tool], model=model)
	output = agent.run(
	"Use your tools to read the audio file and return the transcription.",
	additional_args={
	'audio_filepath': filepath,
	'hf_token': <yourtokenhere>,
	'model_for_transcription': 'whisper-small.en'}
	)
	```