File size: 2,935 Bytes
1897f56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
from typing import Iterator
from langchain_core.documents import Document
from langchain_community.document_loaders.base import BaseBlobParser
from langchain_community.document_loaders.blob_loaders import Blob
import io
# import ffmpeg
import speech_recognition as sr
from pydub import AudioSegment


class VideoParser(BaseBlobParser):
    """Parse video files from a blob."""

    def lazy_parse(self, blob: Blob) -> Iterator[Document]:
        """Parse a video file into the Document iterator.



        Args:

            blob: The blob to parse.



        Returns: An iterator of Documents.

        """
        if not blob.mimetype.startswith('video/'):
            raise ValueError("This blob type is not supported for this parser.")

        with blob.as_bytes_io() as video_bytes_io:
            video_bytes_io.seek(0)
            audio_text = self.extract_audio_text(video_bytes_io)
            metadata = {"source": blob.source, 'size': blob.size}
            yield Document(page_content=audio_text, metadata=metadata)

    def extract_audio_text(self, video_bytes_io: io.BytesIO) -> str:
        """Extract text from video audio.



        Args:

            video_bytes_io: The in-memory video bytes.



        Returns: A string representing the transcribed audio text.

        """
        try:
            # Extract audio from video using ffmpeg
            audio_buffer = io.BytesIO()
            # process = (
            #     ffmpeg
            #     .input('pipe:0', format='mp4')
            #     .output('pipe:1', format='wav', acodec='pcm_s16le', ac=1, ar='16000')
            #     .run_async(pipe_stdin=True, pipe_stdout=True, pipe_stderr=True)
            # )
            # stdout, stderr = process.communicate(input=video_bytes_io.read())

            # if process.returncode != 0:
            #     raise RuntimeError(f"ffmpeg error: {stderr.decode()}")

            audio_buffer.write(stdout)
            audio_buffer.seek(0)

            # Load the audio file into Pydub AudioSegment
            audio_segment = AudioSegment.from_file(audio_buffer, format="wav")
            audio_buffer.close()

            # Convert audio to bytes compatible with the recognizer
            audio_stream = io.BytesIO()
            audio_segment.export(audio_stream, format="wav")
            audio_stream.seek(0)

            # Save the audio stream for debugging
            with open("extracted_audio.wav", "wb") as f:
                f.write(audio_stream.getvalue())

            recognizer = sr.Recognizer()
            audio_file = sr.AudioFile(audio_stream)
            with audio_file as source:
                audio_data = recognizer.record(source)
                audio_text = recognizer.recognize_google(audio_data)
                return audio_text

        except Exception as e:
            return f"Error transcribing audio: {str(e)}"