The Beetle models are static embedding models created via Model2Vec Library and serve as good base models for further training or fine-tuning.
Bhavnick Minhas
bhavnicksm
AI & ML interests
Machine Translation, ML Efficiency, Retrieval-Augmented Generation
Recent Activity
updated
a Space
1 day ago
chonkie-ai/README
reacted
to
as-cle-bert's
post
with ๐
4 days ago
One of the biggest challenges I've been facing since I started developing [๐๐๐๐๐ญ๐๐จ๐ฐ๐ง](https://github.com/AstraBert/PdfItDown) was handling correctly the conversion of files like Excel sheets and CSVs: table conversion was bad and messy, almost unusable for downstream tasks๐ซฃ
That's why today I'm excited to introduce ๐ซ๐๐๐๐๐ซ๐ฌ, the new feature of PdfItDown v1.4.0!๐
With ๐ณ๐ฆ๐ข๐ฅ๐ฆ๐ณ๐ด, you can choose among three (for now๐) flavors of text extraction and conversion to PDF:
- ๐๐ผ๐ฐ๐น๐ถ๐ป๐ด, which does a fantastic work with presentations, spreadsheets and word documents๐ฆ
- ๐๐น๐ฎ๐บ๐ฎ๐ฃ๐ฎ๐ฟ๐๐ฒ by LlamaIndex, suitable for more complex and articulated documents, with mixture of texts, images and tables๐ฆ
- ๐ ๐ฎ๐ฟ๐ธ๐๐๐๐ผ๐๐ป by Microsoft, not the best at handling highly structured documents, by extremly flexible in terms of input file format (it can even convert XML, JSON and ZIP files!)โ๏ธ
You can use this new feature in your python scripts (check the attached code snippet!๐) and in the command line interface as well!๐
Have fun and don't forget to star the repo on GitHub โก๏ธ https://github.com/AstraBert/PdfItDown
reacted
to
as-cle-bert's
post
with โค๏ธ
4 days ago
One of the biggest challenges I've been facing since I started developing [๐๐๐๐๐ญ๐๐จ๐ฐ๐ง](https://github.com/AstraBert/PdfItDown) was handling correctly the conversion of files like Excel sheets and CSVs: table conversion was bad and messy, almost unusable for downstream tasks๐ซฃ
That's why today I'm excited to introduce ๐ซ๐๐๐๐๐ซ๐ฌ, the new feature of PdfItDown v1.4.0!๐
With ๐ณ๐ฆ๐ข๐ฅ๐ฆ๐ณ๐ด, you can choose among three (for now๐) flavors of text extraction and conversion to PDF:
- ๐๐ผ๐ฐ๐น๐ถ๐ป๐ด, which does a fantastic work with presentations, spreadsheets and word documents๐ฆ
- ๐๐น๐ฎ๐บ๐ฎ๐ฃ๐ฎ๐ฟ๐๐ฒ by LlamaIndex, suitable for more complex and articulated documents, with mixture of texts, images and tables๐ฆ
- ๐ ๐ฎ๐ฟ๐ธ๐๐๐๐ผ๐๐ป by Microsoft, not the best at handling highly structured documents, by extremly flexible in terms of input file format (it can even convert XML, JSON and ZIP files!)โ๏ธ
You can use this new feature in your python scripts (check the attached code snippet!๐) and in the command line interface as well!๐
Have fun and don't forget to star the repo on GitHub โก๏ธ https://github.com/AstraBert/PdfItDown
Organizations
models
15

bhavnicksm/brown-fairy-base-v0
Updated
โข
7
โข
1

bhavnicksm/red-beetle-base-v1.1
Updated
โข
7
โข
2

bhavnicksm/red-beetle-small-v1.1
Updated
โข
5
โข
2

bhavnicksm/red-beetle-small-v1
Updated
โข
8
โข
2

bhavnicksm/red-beetle-base-v1
Updated
โข
5
โข
2

bhavnicksm/red-beetle-base-v0
Updated
โข
6
โข
2

bhavnicksm/brown-beetle-tiny-v1.1
Updated
โข
8
โข
1

bhavnicksm/brown-beetle-tiny-v1
Updated
โข
9
โข
2

bhavnicksm/brown-beetle-small-v1.1
Updated
โข
8
โข
1

bhavnicksm/brown-beetle-base-v1.1
Updated
โข
8
โข
1
datasets
5
bhavnicksm/fineweb-edu-micro
Viewer
โข
Updated
โข
394
โข
424
bhavnicksm/fineweb-edu-gpt2-tokenized
Updated
โข
44
bhavnicksm/PokemonCardsPlus
Viewer
โข
Updated
โข
13.1k
โข
17
โข
3
bhavnicksm/free_marco
Viewer
โข
Updated
โข
559k
โข
31
bhavnicksm/sentihood
Viewer
โข
Updated
โข
5.22k
โข
124
โข
5