Bhavnick Minhas

bhavnicksm

AI & ML interests

Machine Translation, ML Efficiency, Retrieval-Augmented Generation

Recent Activity

updated a Space 1 day ago
chonkie-ai/README
reacted to as-cle-bert's post with ๐Ÿ‘ 4 days ago
One of the biggest challenges I've been facing since I started developing [๐๐๐Ÿ๐ˆ๐ญ๐ƒ๐จ๐ฐ๐ง](https://github.com/AstraBert/PdfItDown) was handling correctly the conversion of files like Excel sheets and CSVs: table conversion was bad and messy, almost unusable for downstream tasks๐Ÿซฃ That's why today I'm excited to introduce ๐ซ๐ž๐š๐๐ž๐ซ๐ฌ, the new feature of PdfItDown v1.4.0!๐ŸŽ‰ With ๐˜ณ๐˜ฆ๐˜ข๐˜ฅ๐˜ฆ๐˜ณ๐˜ด, you can choose among three (for now๐Ÿ‘€) flavors of text extraction and conversion to PDF: - ๐——๐—ผ๐—ฐ๐—น๐—ถ๐—ป๐—ด, which does a fantastic work with presentations, spreadsheets and word documents๐Ÿฆ† - ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ๐—ฃ๐—ฎ๐—ฟ๐˜€๐—ฒ by LlamaIndex, suitable for more complex and articulated documents, with mixture of texts, images and tables๐Ÿฆ™ - ๐— ๐—ฎ๐—ฟ๐—ธ๐—œ๐˜๐——๐—ผ๐˜„๐—ป by Microsoft, not the best at handling highly structured documents, by extremly flexible in terms of input file format (it can even convert XML, JSON and ZIP files!)โœ’๏ธ You can use this new feature in your python scripts (check the attached code snippet!๐Ÿ˜‰) and in the command line interface as well!๐Ÿ Have fun and don't forget to star the repo on GitHub โžก๏ธ https://github.com/AstraBert/PdfItDown
reacted to as-cle-bert's post with โค๏ธ 4 days ago
One of the biggest challenges I've been facing since I started developing [๐๐๐Ÿ๐ˆ๐ญ๐ƒ๐จ๐ฐ๐ง](https://github.com/AstraBert/PdfItDown) was handling correctly the conversion of files like Excel sheets and CSVs: table conversion was bad and messy, almost unusable for downstream tasks๐Ÿซฃ That's why today I'm excited to introduce ๐ซ๐ž๐š๐๐ž๐ซ๐ฌ, the new feature of PdfItDown v1.4.0!๐ŸŽ‰ With ๐˜ณ๐˜ฆ๐˜ข๐˜ฅ๐˜ฆ๐˜ณ๐˜ด, you can choose among three (for now๐Ÿ‘€) flavors of text extraction and conversion to PDF: - ๐——๐—ผ๐—ฐ๐—น๐—ถ๐—ป๐—ด, which does a fantastic work with presentations, spreadsheets and word documents๐Ÿฆ† - ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ๐—ฃ๐—ฎ๐—ฟ๐˜€๐—ฒ by LlamaIndex, suitable for more complex and articulated documents, with mixture of texts, images and tables๐Ÿฆ™ - ๐— ๐—ฎ๐—ฟ๐—ธ๐—œ๐˜๐——๐—ผ๐˜„๐—ป by Microsoft, not the best at handling highly structured documents, by extremly flexible in terms of input file format (it can even convert XML, JSON and ZIP files!)โœ’๏ธ You can use this new feature in your python scripts (check the attached code snippet!๐Ÿ˜‰) and in the command line interface as well!๐Ÿ Have fun and don't forget to star the repo on GitHub โžก๏ธ https://github.com/AstraBert/PdfItDown
View all activity

Organizations

Flax Community's profile picture Training Transformers Together's profile picture Cohere Labs Community's profile picture Chonkie Inc.'s profile picture