Nicolay Rusnachenko's picture

Nicolay Rusnachenko

nicolay-r

AI & ML interests

Information RetrievalใƒปMedical Multimodal NLP (๐Ÿ–ผ+๐Ÿ“) Research Fellow @BU_Researchใƒปsoftware developer http://arekit.ioใƒปPhD in NLP

Recent Activity

reacted to as-cle-bert's post with ๐Ÿ”ฅ 1 day ago
Ever dreamt of ingesting into a vector DB that pile of CSVs, Word documents and presentations laying in some remote folders on your PC?๐Ÿ—‚๏ธ What if I told you that you can do it within three to six lines of code?๐Ÿคฏ Well, with my latest open-source project, ๐ข๐ง๐ ๐ž๐ฌ๐ญ-๐š๐ง๐ฒ๐ญ๐ก๐ข๐ง๐  (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!๐Ÿš€ How? It's pretty simple! ๐Ÿ“ The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown) ๐Ÿ“‘ The PDF text is extracted using LlamaIndex readers ๐Ÿฆ› The text is chunked exploiting Chonkie ๐Ÿงฎ The chunks are embedded thanks to Sentence Transformers models ๐Ÿ—„๏ธ The embeddings are loaded into a Qdrant vector database And you're done!โœ… Curious of trying it? Install it by running: ๐˜ฑ๐˜ช๐˜ฑ ๐˜ช๐˜ฏ๐˜ด๐˜ต๐˜ข๐˜ญ๐˜ญ ๐˜ช๐˜ฏ๐˜จ๐˜ฆ๐˜ด๐˜ต-๐˜ข๐˜ฏ๐˜บ๐˜ต๐˜ฉ๐˜ช๐˜ฏ๐˜จ And you can start using it in your python scripts!๐Ÿ Don't forget to star it on GitHub and let me know if you have any feedback! โžก๏ธ https://github.com/AstraBert/ingest-anything
View all activity

Organizations

None yet

nicolay-r's activity

reacted to julien-c's post with ๐Ÿ”ฅ 1 day ago
view post
Post
3431
BOOOOM: Today I'm dropping TINY AGENTS

the 50 lines of code Agent in Javascript ๐Ÿ”ฅ

I spent the last few weeks working on this, so I hope you will like it.

I've been diving into MCP (Model Context Protocol) to understand what the hype was all about.

It is fairly simple, but still quite powerful: MCP is a standard API to expose sets of Tools that can be hooked to LLMs.

But while doing that, came my second realization:

Once you have a MCP Client, an Agent is literally just a while loop on top of it. ๐Ÿคฏ

โžก๏ธ read it exclusively on the official HF blog: https://huggingface.co/blog/tiny-agents
  • 1 reply
ยท
reacted to as-cle-bert's post with ๐Ÿ”ฅ 1 day ago
view post
Post
2713
Ever dreamt of ingesting into a vector DB that pile of CSVs, Word documents and presentations laying in some remote folders on your PC?๐Ÿ—‚๏ธ
What if I told you that you can do it within three to six lines of code?๐Ÿคฏ
Well, with my latest open-source project, ๐ข๐ง๐ ๐ž๐ฌ๐ญ-๐š๐ง๐ฒ๐ญ๐ก๐ข๐ง๐  (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!๐Ÿš€
How? It's pretty simple!
๐Ÿ“ The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown)
๐Ÿ“‘ The PDF text is extracted using LlamaIndex readers
๐Ÿฆ› The text is chunked exploiting Chonkie
๐Ÿงฎ The chunks are embedded thanks to Sentence Transformers models
๐Ÿ—„๏ธ The embeddings are loaded into a Qdrant vector database

And you're done!โœ…
Curious of trying it? Install it by running:

๐˜ฑ๐˜ช๐˜ฑ ๐˜ช๐˜ฏ๐˜ด๐˜ต๐˜ข๐˜ญ๐˜ญ ๐˜ช๐˜ฏ๐˜จ๐˜ฆ๐˜ด๐˜ต-๐˜ข๐˜ฏ๐˜บ๐˜ต๐˜ฉ๐˜ช๐˜ฏ๐˜จ

And you can start using it in your python scripts!๐Ÿ
Don't forget to star it on GitHub and let me know if you have any feedback! โžก๏ธ https://github.com/AstraBert/ingest-anything
  • 4 replies
ยท
posted an update 2 days ago
view post
Post
2177
๐Ÿš€ Delighted to share a major milestone in adapting reasoning techniques for data collections augmentation!
Introducing bulk-chain 1.0.0 -- the first major release of a no-string API for adapting your LLM for Chain-of-Thought alike reasoning over records with large amount of parameters across large datasets.

โญ Check it out: https://github.com/nicolay-r/bulk-chain

Whatโ€™s new and why it matters:
๐Ÿ“ฆ Fully no-string API for easy client deployment
๐Ÿ”ฅ Demos are now standalone projects:

Demos:
๐Ÿ“บ bash / shell (dispatched): https://github.com/nicolay-r/bulk-chain-shell
๐Ÿ“บ tksheet: https://github.com/nicolay-r/bulk-chain-tksheet-client

Using nlp-thirdgate to host the supported providers:
๐ŸŒŒ LLM providers: https://github.com/nicolay-r/nlp-thirdgate
reacted to fdaudens's post with ๐Ÿคฏ 17 days ago
view post
Post
4086
๐ŸŽจ Designers, meet OmniSVG! This new model helps you create professional vector graphics from text/images, generate editable SVGs from icons to detailed characters, convert rasters to vectors, maintain style consistency with references, and integrate into your workflow.

@OmniSVG
  • 2 replies
ยท
posted an update 19 days ago
posted an update 24 days ago
view post
Post
1764
๐Ÿ“ข For those who in textual IR and experimenting with quick deployment of CoT / reasoning, the following update might be relevant. I am happy to announce new version of the bulk-chain 0.25.3. It is a no-string framework for quick application of reasoning schema adaptation over your data.

https://github.com/nicolay-r/bulk-chain/releases/tag/0.25.3

The latest release brings huge updates on:
โœ… Reforged mechanism of models inference that work in steraming mode.
- Callbacks support for streaming mode (earlier only in demo)
- Deployment of various clients (shell, tksheet; see attachment)
โœ… Support for batching (earlier in API mode only)
โœ… Optional caching of inferred data in SQlite (always enabled earlier)
- This now makes possible to faster launch small (but mighty) LLMs

๐ŸŒŸ Project: https://github.com/nicolay-r/bulk-chain
๐ŸŒŒ Proviers: https://github.com/nicolay-r/nlp-thirdgate

posted an update about 1 month ago
view post
Post
1666
The Concept behind xLSTM has recently turn into the xLSTM-7B model that showcase the performance in the category of the similar-scale Gemma 7B, LLama2 7B, FlaconMamba 7B but with higher performing Inference Kernel

Model: NX-AI/xLSTM-7b
Paper: https://arxiv.org/abs/2503.13427

  • 1 reply
ยท
posted an update about 1 month ago
view post
Post
672
๐Ÿ“ข Several weeks ago Microsoft announced Phi-4. My most-recent list of LLM models have had only wrapper for Phi-2, so it was time to update! With this post, happy to share that Phi-4 wrapper is now available at nlp-thirdgate for adopting Chain-of-Thought reasoning:

๐Ÿค– https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_phi4.py

๐Ÿ“’ https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_phi4.py

Findings on adaptation: I was able to reproduce only the pipeline based model launching. This version is for textual llm only. Microsoft also released multimodal Phi-4 which is out of scope of this wrapper.

๐ŸŒŒ nlp-thirdgate: https://lnkd.in/ef-wBnNn
posted an update about 1 month ago
view post
Post
1132
๐Ÿ“ข Delighted to announce the updated version of the no-string framework for chain-of-thought application over JSONL/CSV data:
https://github.com/nicolay-r/bulk-chain/releases/tag/0.25.2

๐Ÿ”ง Fixes:
- Fixed issues with batching mode
- Fixed problem with parsing and passing args in shell mode

โš ๏ธ Limitation: bathing mode is still available only via API.

๐Ÿ“’ Quick Start with Gemma-3 in batching mode: https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_gemma_3.ipynb
replied to their post about 1 month ago
view reply

The important comment is to use the very latest version of the bulk-chain from github which fixes the bug for double-inference in batching.

posted an update about 2 months ago
view post
Post
1582
๐Ÿ“ข With the recent release of Gemma-3, If you interested to play with textual chain-of-though, the notebook below is a wrapper over the the model (native transformers inference API) for passing the predefined schema of promps in batching mode.
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_gemma_3.ipynb

Limitation: schema supports texts only (for now), while gemma-3 is a text+image to text.

Model: google/gemma-3-1b-it
Provider: https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_gemma3.py
  • 1 reply
ยท