streamlit scikit-learn pdfplumber PyPDF4 docx2txt transformers torch huggingface_hub