Filter speech from audio and video files
Separate vocals and accompaniment from audio
Convert spoken words into text using ASR