Ever dreamt of ingesting into a vector DB that pile of CSVs, Word documents and presentations laying in some remote folders on your PC?๐๏ธ What if I told you that you can do it within three to six lines of code?๐คฏ Well, with my latest open-source project, ๐ข๐ง๐ ๐๐ฌ๐ญ-๐๐ง๐ฒ๐ญ๐ก๐ข๐ง๐ (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!๐ How? It's pretty simple! ๐ The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown) ๐ The PDF text is extracted using LlamaIndex readers ๐ฆ The text is chunked exploiting Chonkie ๐งฎ The chunks are embedded thanks to Sentence Transformers models ๐๏ธ The embeddings are loaded into a Qdrant vector database
And you're done!โ Curious of trying it? Install it by running:
And you can start using it in your python scripts!๐ Don't forget to star it on GitHub and let me know if you have any feedback! โก๏ธ https://github.com/AstraBert/ingest-anything
smolagents v1.14.0 is out! ๐ ๐ MCPClient: A sleek new client for connecting to remote MCP servers, making integrations more flexible and scalable. ๐ชจ Amazon Bedrock: Native support for Bedrock-hosted models. SmolAgents is now more powerful, flexible, and enterprise-ready. ๐ผ
๐ Papers Leaderboard - See the Latest AI Research Trends at a Glance! โจ
Hello, AI research community! Today I'm introducing a new tool for exploring research papers. Papers Leaderboard is an open-source dashboard that makes it easy to find and filter the latest AI research papers.
Date Filtering: View only papers published within a specific timeframe (from May 5, 2023 to present) Title Search: Quickly find papers containing your keywords of interest Abstract Search: Explore paper content more deeply by searching for keywords within abstracts Automatic Updates: The database is updated with the latest papers every hour
๐ก How to Use It?
Select a start date and end date Enter keywords you want to find in titles or abstracts Adjust the maximum number of search results for abstract searches Results are displayed neatly in table format
reacted to stefan-french's
post with ๐22 days ago
๐ We just released the latest version of our any-agent library, now with support for the Agno agent framework. Check it out: https://github.com/mozilla-ai/any-agent
Llama-4 is out and I couldn't resist but to cook something with it... So I came up with ๐๐ฅ๐๐ฆ๐๐๐๐ฌ๐๐๐ซ๐๐ก๐๐ซ (https://llamaresearcher.com), your deep-research AI companion!๐
The workflow behind ๐๐น๐ฎ๐บ๐ฎ๐ฅ๐ฒ๐๐ฒ๐ฎ๐ฟ๐ฐ๐ต๐ฒ๐ฟ is simple: ๐ฌ You submit a query ๐ก๏ธ Your query is evaluated by Llama 3 guard model, which deems it safe or unsafe ๐ง If your query is safe, it is routed to the Researcher Agent โ๏ธ The Researcher Agent expands the query into three sub-queries, with which to search the web ๐ The web is searched for each of the sub-queries ๐ The retrieved information is evaluated for relevancy against your original query โ๏ธ The Researcher Agent produces an essay based on the information it gathered, paying attention to referencing its sources
The agent itself is also built with easy-to-use and intuitive blocks: ๐ฆ LlamaIndex provides the agentic architecture and the integrations with the language models โกGroq makes Llama-4 available with its lightning-fast inference ๐ Linkup allows the agent to deep-search the web and provides sourced answers ๐ช FastAPI does the heavy loading with wrapping everything within an elegant API interface โฑ๏ธ Redis is used for API rate limiting ๐จ Gradio creates a simple but powerful user interface
Special mention also to Lovable, which helped me build the first draft of the landing page for LlamaResearcher!๐
If you're curious and you want to try LlamaResearcher, you can - completely for free and without subscription - for 30 days from now โก๏ธ https://llamaresearcher.com And if you're like me, and you like getting your hands in code and build stuff on your own machine, I have good news: this is all open-source, fully reproducible locally and Docker-ready๐ Just go to the GitHub repo: https://github.com/AstraBert/llama-4-researcher and don't forget to star it, if you find it useful!โญ
As always, have fun and feel free to leave your feedbackโจ
5 years ago, we launched Gradio as a simple Python library to let researchers at Stanford easily demo computer vision models with a web interface.
Today, Gradio is used by >1 million developers each month to build and share AI web apps. This includes some of the most popular open-source projects of all time, like Automatic1111, Fooocus, Oobaboogaโs Text WebUI, Dall-E Mini, and LLaMA-Factory.
How did we get here? How did Gradio keep growing in the very crowded field of open-source Python libraries? I get this question a lot from folks who are building their own open-source libraries. This post distills some of the lessons that I have learned over the past few years:
1. Invest in good primitives, not high-level abstractions 2. Embed virality directly into your library 3. Focus on a (growing) niche 4. Your only roadmap should be rapid iteration 5. Maximize ways users can consume your library's outputs
1. Invest in good primitives, not high-level abstractions
When we first launched Gradio, we offered only one high-level class (gr.Interface), which created a complete web app from a single Python function. We quickly realized that developers wanted to create other kinds of apps (e.g. multi-step workflows, chatbots, streaming applications), but as we started listing out the apps users wanted to build, we realized what we needed to do:
To Meta AI Research: I would like to fold ylacombe/expresso into the training mix of an Apache TTS model series. Can you relax the Expresso dataset license to CC-BY or more permissive?
Barring that, can I have an individual exception to train on the materials and distribute trained Apache models, without direct redistribution of the original files? Thanks!
Before 2020, most of the AI field was open and collaborative. For me, that was the key factor that accelerated scientific progress and made the impossible possibleโjust look at the โTโ in ChatGPT, which comes from the Transformer architecture openly shared by Google.
Then came the myth that AI was too dangerous to share, and companies started optimizing for short-term revenue. That led many major AI labs and researchers to stop sharing and collaborating.
With OAI and sama now saying they're willing to share open weights again, we have a real chance to return to a golden age of AI progress and democratizationโpowered by openness and collaboration, in the US and around the world.
This is incredibly exciting. Letโs go, open science and open-source AI!
5 replies
ยท
reacted to burtenshaw's
post with โค๏ธabout 2 months ago
This includes LlamaIndex, LangChain, and our very own smolagents. We've worked to integrate the three frameworks in distinctive ways so that learners can reflect on when and where to use each.
This also means that you can follow the course if you're already familiar with one of these frameworks, and soak up some of the fundamental knowledge in earlier units.
Hopefully, this makes the agents course as open to as many people as possible.
3 replies
ยท
reacted to chansung's
post with โค๏ธabout 2 months ago
๐ข Our EMOVA paper has been accepted by CVPR 2025, and we are glad to release all resources, including code (training & inference), datasets (training & evaluation), and checkpoints (EMOVA-3B/7B/72B)!
๐ค EMOVA is a novel end-to-end omni-modal LLM that can see, hear and speak. Given omni-modal (i.e., textual, visual and speech) inputs, EMOVA can generate both textual and speech responses with vivid emotional controls by utilizing the speech decoder and a style controller.
โจ EMOVA Highlights โ State-of-the-art omni-modality: EMOVA achieves SoTA comparable results on both vision-language and speech benchmarks simultaneously. โ Device adaptation: our codebase supports training/inference on both NVIDIA GPUs (e.g., A800 & H20) and Ascend NPUs (e.g., 910B3)! โ Modular design: we integrate multiple implementations of vision encoder, vision projector, and language model, even including the most recent DeepSeekMoE-tiny!
As one of the most popular local inference solutions, the community had been asking us to integrate vLLM: after a heavy refactoring of our LLM classes, we've just released smolagents 1.11.0, with a brand new VLLMModel class.
We just crossed 1,500,000 public models on Hugging Face (and 500k spaces, 330k datasets, 50k papers). One new repository is created every 15 seconds. Congratulations all!
3 replies
ยท
reacted to AdinaY's
post with ๐ฅabout 2 months ago