Clelia Astra Bertelli
as-cle-bert
Ā·
AI & ML interests
Biology + Artificial Intelligence = ā¤ļø | AI for sustainable development, sustainable development for AI | Researching on Machine Learning Enhancement | I love automation for everyday things | Blogger | Open Source
Recent Activity
replied to
their
post
about 2 hours ago
Ever dreamt of ingesting into a vector DB that pile of CSVs, Word documents and presentations laying in some remote folders on your PC?šļø
What if I told you that you can do it within three to six lines of code?š¤Æ
Well, with my latest open-source project, š¢š§š šš¬š-šš§š²šš”š¢š§š (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!š
How? It's pretty simple!
š The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown)
š The PDF text is extracted using LlamaIndex readers
š¦ The text is chunked exploiting Chonkie
š§® The chunks are embedded thanks to Sentence Transformers models
šļø The embeddings are loaded into a Qdrant vector database
And you're done!ā
Curious of trying it? Install it by running:
š±šŖš± šŖšÆš“šµš¢šš šŖšÆšØš¦š“šµ-š¢šÆšŗšµš©šŖšÆšØ
And you can start using it in your python scripts!š
Don't forget to star it on GitHub and let me know if you have any feedback! ā”ļø https://github.com/AstraBert/ingest-anything
replied to
their
post
2 days ago
Ever dreamt of ingesting into a vector DB that pile of CSVs, Word documents and presentations laying in some remote folders on your PC?šļø
What if I told you that you can do it within three to six lines of code?š¤Æ
Well, with my latest open-source project, š¢š§š šš¬š-šš§š²šš”š¢š§š (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!š
How? It's pretty simple!
š The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown)
š The PDF text is extracted using LlamaIndex readers
š¦ The text is chunked exploiting Chonkie
š§® The chunks are embedded thanks to Sentence Transformers models
šļø The embeddings are loaded into a Qdrant vector database
And you're done!ā
Curious of trying it? Install it by running:
š±šŖš± šŖšÆš“šµš¢šš šŖšÆšØš¦š“šµ-š¢šÆšŗšµš©šŖšÆšØ
And you can start using it in your python scripts!š
Don't forget to star it on GitHub and let me know if you have any feedback! ā”ļø https://github.com/AstraBert/ingest-anything
posted
an
update
3 days ago
Ever dreamt of ingesting into a vector DB that pile of CSVs, Word documents and presentations laying in some remote folders on your PC?šļø
What if I told you that you can do it within three to six lines of code?š¤Æ
Well, with my latest open-source project, š¢š§š šš¬š-šš§š²šš”š¢š§š (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!š
How? It's pretty simple!
š The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown)
š The PDF text is extracted using LlamaIndex readers
š¦ The text is chunked exploiting Chonkie
š§® The chunks are embedded thanks to Sentence Transformers models
šļø The embeddings are loaded into a Qdrant vector database
And you're done!ā
Curious of trying it? Install it by running:
š±šŖš± šŖšÆš“šµš¢šš šŖšÆšØš¦š“šµ-š¢šÆšŗšµš©šŖšÆšØ
And you can start using it in your python scripts!š
Don't forget to star it on GitHub and let me know if you have any feedback! ā”ļø https://github.com/AstraBert/ingest-anything
Organizations
as-cle-bert's activity
Update requirements.txt
1
#1 opened about 2 months ago
by
not-lain

Librarian Bot: Add language metadata for dataset
#2 opened 3 months ago
by
librarian-bot

Ideas!
2
#1 opened 5 months ago
by
davanstrien

why
1
#1 opened 9 months ago
by
YaserDS-777
Librarian Bot: Add language metadata for dataset
#2 opened about 1 year ago
by
librarian-bot

[bot] Conversion to Parquet
#1 opened about 1 year ago
by
parquet-converter

[ASSISTANTS] Community thread
2
189
#356 opened about 1 year ago
by
victor

[bot] Conversion to Parquet
#1 opened about 1 year ago
by
parquet-converter

[bot] Conversion to Parquet
#1 opened about 1 year ago
by
parquet-converter

[bot] Conversion to Parquet
#1 opened about 1 year ago
by
parquet-converter

[bot] Conversion to Parquet
#1 opened about 1 year ago
by
parquet-converter

[bot] Conversion to Parquet
#1 opened about 1 year ago
by
parquet-converter
