Nicolay Rusnachenko's picture

Nicolay Rusnachenko

nicolay-r

AI & ML interests

Information RetrievalใƒปMedical Multimodal NLP (๐Ÿ–ผ+๐Ÿ“) Research Fellow @BU_Researchใƒปsoftware developer http://arekit.ioใƒปPhD in NLP

Recent Activity

reacted to as-cle-bert's post with ๐Ÿ”ฅ 1 day ago
Ever dreamt of ingesting into a vector DB that pile of CSVs, Word documents and presentations laying in some remote folders on your PC?๐Ÿ—‚๏ธ What if I told you that you can do it within three to six lines of code?๐Ÿคฏ Well, with my latest open-source project, ๐ข๐ง๐ ๐ž๐ฌ๐ญ-๐š๐ง๐ฒ๐ญ๐ก๐ข๐ง๐  (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!๐Ÿš€ How? It's pretty simple! ๐Ÿ“ The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown) ๐Ÿ“‘ The PDF text is extracted using LlamaIndex readers ๐Ÿฆ› The text is chunked exploiting Chonkie ๐Ÿงฎ The chunks are embedded thanks to Sentence Transformers models ๐Ÿ—„๏ธ The embeddings are loaded into a Qdrant vector database And you're done!โœ… Curious of trying it? Install it by running: ๐˜ฑ๐˜ช๐˜ฑ ๐˜ช๐˜ฏ๐˜ด๐˜ต๐˜ข๐˜ญ๐˜ญ ๐˜ช๐˜ฏ๐˜จ๐˜ฆ๐˜ด๐˜ต-๐˜ข๐˜ฏ๐˜บ๐˜ต๐˜ฉ๐˜ช๐˜ฏ๐˜จ And you can start using it in your python scripts!๐Ÿ Don't forget to star it on GitHub and let me know if you have any feedback! โžก๏ธ https://github.com/AstraBert/ingest-anything
View all activity

Organizations

None yet

Posts 68

view post
Post
1393
๐Ÿš€ Delighted to share a major milestone in adapting reasoning techniques for data collections augmentation!
Introducing bulk-chain 1.0.0 -- the first major release of a no-string API for adapting your LLM for Chain-of-Thought alike reasoning over records with large amount of parameters across large datasets.

โญ Check it out: https://github.com/nicolay-r/bulk-chain

Whatโ€™s new and why it matters:
๐Ÿ“ฆ Fully no-string API for easy client deployment
๐Ÿ”ฅ Demos are now standalone projects:

Demos:
๐Ÿ“บ bash / shell (dispatched): https://github.com/nicolay-r/bulk-chain-shell
๐Ÿ“บ tksheet: https://github.com/nicolay-r/bulk-chain-tksheet-client

Using nlp-thirdgate to host the supported providers:
๐ŸŒŒ LLM providers: https://github.com/nicolay-r/nlp-thirdgate

datasets 0

None public yet