Clelia Astra Bertelli's picture

Clelia Astra Bertelli

as-cle-bert

AI & ML interests

Biology + Artificial Intelligence = โค๏ธ | AI for sustainable development, sustainable development for AI | Researching on Machine Learning Enhancement | I love automation for everyday things | Blogger | Open Source

Recent Activity

replied to their post 1 day ago
Ever dreamt of ingesting into a vector DB that pile of CSVs, Word documents and presentations laying in some remote folders on your PC?๐Ÿ—‚๏ธ What if I told you that you can do it within three to six lines of code?๐Ÿคฏ Well, with my latest open-source project, ๐ข๐ง๐ ๐ž๐ฌ๐ญ-๐š๐ง๐ฒ๐ญ๐ก๐ข๐ง๐  (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!๐Ÿš€ How? It's pretty simple! ๐Ÿ“ The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown) ๐Ÿ“‘ The PDF text is extracted using LlamaIndex readers ๐Ÿฆ› The text is chunked exploiting Chonkie ๐Ÿงฎ The chunks are embedded thanks to Sentence Transformers models ๐Ÿ—„๏ธ The embeddings are loaded into a Qdrant vector database And you're done!โœ… Curious of trying it? Install it by running: ๐˜ฑ๐˜ช๐˜ฑ ๐˜ช๐˜ฏ๐˜ด๐˜ต๐˜ข๐˜ญ๐˜ญ ๐˜ช๐˜ฏ๐˜จ๐˜ฆ๐˜ด๐˜ต-๐˜ข๐˜ฏ๐˜บ๐˜ต๐˜ฉ๐˜ช๐˜ฏ๐˜จ And you can start using it in your python scripts!๐Ÿ Don't forget to star it on GitHub and let me know if you have any feedback! โžก๏ธ https://github.com/AstraBert/ingest-anything
posted an update 2 days ago
Ever dreamt of ingesting into a vector DB that pile of CSVs, Word documents and presentations laying in some remote folders on your PC?๐Ÿ—‚๏ธ What if I told you that you can do it within three to six lines of code?๐Ÿคฏ Well, with my latest open-source project, ๐ข๐ง๐ ๐ž๐ฌ๐ญ-๐š๐ง๐ฒ๐ญ๐ก๐ข๐ง๐  (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!๐Ÿš€ How? It's pretty simple! ๐Ÿ“ The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown) ๐Ÿ“‘ The PDF text is extracted using LlamaIndex readers ๐Ÿฆ› The text is chunked exploiting Chonkie ๐Ÿงฎ The chunks are embedded thanks to Sentence Transformers models ๐Ÿ—„๏ธ The embeddings are loaded into a Qdrant vector database And you're done!โœ… Curious of trying it? Install it by running: ๐˜ฑ๐˜ช๐˜ฑ ๐˜ช๐˜ฏ๐˜ด๐˜ต๐˜ข๐˜ญ๐˜ญ ๐˜ช๐˜ฏ๐˜จ๐˜ฆ๐˜ด๐˜ต-๐˜ข๐˜ฏ๐˜บ๐˜ต๐˜ฉ๐˜ช๐˜ฏ๐˜จ And you can start using it in your python scripts!๐Ÿ Don't forget to star it on GitHub and let me know if you have any feedback! โžก๏ธ https://github.com/AstraBert/ingest-anything
View all activity

Organizations

Social Post Explorers's profile picture Hugging Face Discord Community's profile picture GreenFit AI's profile picture

Posts 44

view post
Post
2595
Ever dreamt of ingesting into a vector DB that pile of CSVs, Word documents and presentations laying in some remote folders on your PC?๐Ÿ—‚๏ธ
What if I told you that you can do it within three to six lines of code?๐Ÿคฏ
Well, with my latest open-source project, ๐ข๐ง๐ ๐ž๐ฌ๐ญ-๐š๐ง๐ฒ๐ญ๐ก๐ข๐ง๐  (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!๐Ÿš€
How? It's pretty simple!
๐Ÿ“ The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown)
๐Ÿ“‘ The PDF text is extracted using LlamaIndex readers
๐Ÿฆ› The text is chunked exploiting Chonkie
๐Ÿงฎ The chunks are embedded thanks to Sentence Transformers models
๐Ÿ—„๏ธ The embeddings are loaded into a Qdrant vector database

And you're done!โœ…
Curious of trying it? Install it by running:

๐˜ฑ๐˜ช๐˜ฑ ๐˜ช๐˜ฏ๐˜ด๐˜ต๐˜ข๐˜ญ๐˜ญ ๐˜ช๐˜ฏ๐˜จ๐˜ฆ๐˜ด๐˜ต-๐˜ข๐˜ฏ๐˜บ๐˜ต๐˜ฉ๐˜ช๐˜ฏ๐˜จ

And you can start using it in your python scripts!๐Ÿ
Don't forget to star it on GitHub and let me know if you have any feedback! โžก๏ธ https://github.com/AstraBert/ingest-anything

Articles 10

Article
8

Why we (don't) need export control