Evaluate and submit answers to questions using an agent
Convert audio to text using Whisper model
Process images and videos to generate text