boto3 torch streamlit transformers sentence-transformers PyPDF4 docx2txt scikit-learn PyPDF2 pdfplumber