Gemini's proprietary license is a deal-breaker. It's not just about performanceโit's about freedom. Google's terms actively restrict libre use, while models like QwQ 32B and DeepSeek v3 (when properly licensed) respect user rights. Never conflate ethically-licensed AI with corporate traps that forbid modification, redistribution, or independent use.
Jean Louis PRO
AI & ML interests
Recent Activity
Organizations
JLouisBiz's activity

That's why today I'm excited to introduce ๐ซ๐๐๐๐๐ซ๐ฌ, the new feature of PdfItDown v1.4.0!๐
With ๐ณ๐ฆ๐ข๐ฅ๐ฆ๐ณ๐ด, you can choose among three (for now๐) flavors of text extraction and conversion to PDF:
- ๐๐ผ๐ฐ๐น๐ถ๐ป๐ด, which does a fantastic work with presentations, spreadsheets and word documents๐ฆ
- ๐๐น๐ฎ๐บ๐ฎ๐ฃ๐ฎ๐ฟ๐๐ฒ by LlamaIndex, suitable for more complex and articulated documents, with mixture of texts, images and tables๐ฆ
- ๐ ๐ฎ๐ฟ๐ธ๐๐๐๐ผ๐๐ป by Microsoft, not the best at handling highly structured documents, by extremly flexible in terms of input file format (it can even convert XML, JSON and ZIP files!)โ๏ธ
You can use this new feature in your python scripts (check the attached code snippet!๐) and in the command line interface as well!๐
Have fun and don't forget to star the repo on GitHub โก๏ธ https://github.com/AstraBert/PdfItDown

Just tested it with Steve Jobs' Stanford speech and was speechless (pun intended). The video isnโt sped up.
3 things that floored me:
- Transcription took just 10 seconds for a 15-min file
- Got a CSV with perfect timestamps, punctuation & capitalization
- Stunning accuracy (correctly captured "Reed College" and other specifics)
NVIDIA also released a demo where you can click any transcribed segment to play it instantly.
The improvement is significant: number 1 on the ASR Leaderboard, 6% error rate (best in class) with complete commercial freedom (cc-by-4.0 license).
Time to update those Whisper pipelines! H/t @Steveeeeeeen for the finding!
Model: nvidia/parakeet-tdt-0.6b-v2
Demo: nvidia/parakeet-tdt-0.6b-v2
ASR Leaderboard: hf-audio/open_asr_leaderboard

And thatโs just part of the companies from the Chinese community that released open models in April ๐คฏ
zh-ai-community/april-2025-open-releases-from-the-chinese-community-67ea699965f6e4c135cab10f
๐ฌ Video
> MAGI-1 by SandAI
> SkyReels-A2 & SkyReels-V2 by Skywork
> Wan2.1-FLF2V by Alibaba-Wan
๐จ Image
> HiDream-I1 by Vivago AI
> Kimi-VL by Moonshot AI
> InstantCharacter by InstantX & Tencent-Hunyuan
> Step1X-Edit by StepFun
> EasyControl by Shanghai Jiaotong University
๐ง Reasoning
> MiMo by Xiaomi
> Skywork-R1V 2.0 by Skywork
> ChatTS by ByteDance
> Kimina by Moonshot AI & Numina
> GLM-Z1 by Zhipu AI
> Skywork OR1 by Skywork
> Kimi-VL-Thinking by Moonshot AI
๐ Audio
> Kimi-Audio by Moonshot AI
> IndexTTS by BiliBili
> MegaTTS3 by ByteDance
> Dolphin by DataOceanAI
๐ข Math
> DeepSeek Prover V2 by Deepseek
๐ LLM
> Qwen by Alibaba-Qwen
> InternVL3 by Shanghai AI lab
> Ernie4.5 (demo) by Baidu
๐ Dataset
> PHYBench by Eureka-Lab
> ChildMandarin & Seniortalk by BAAI
Please feel free to add if I missed anything!

The dataset simulates the discovery phase of a fictitious VC firm called Reasoned Capital and, once expanded, can be used to create models which are able to make complex, subjective financial decisions based on different criteria.
The generation process encourages recursive problem-solving in increasingly complex prompts to encourage models to assess and reevaluate the conclusions and generated opinions of upstream models. Pretty neat stuff, and I'm not aware of this architecture being used in a reasoning context anywhere else.
Check it out: ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset

moonshotai/Kimi-Audio-7B-Instruct
โจ 7B
โจ 13M+ hours of pretraining data
โจ Novel hybrid input architecture
โจ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)

Today, we're releasing an expanded version: 32K images annotated with 3.7M responses from over 300K individuals which was completed in under two weeks using the Rapidata Python API.
Rapidata/text-2-image-Rich-Human-Feedback-32k
A few months ago, we published one of our most liked dataset with 13K images based on the @data-is-better-together 's dataset, following Google's research on "Rich Human Feedback for Text-to-Image Generation" (https://arxiv.org/abs/2312.10240). It collected over 1.5M responses from 150K+ participants.
Rapidata/text-2-image-Rich-Human-Feedback
In the examples below, users highlighted words from prompts that were not correctly depicted in the generated images. Higher word scores indicate more frequent issues. If an image captured the prompt accurately, users could select [No_mistakes].
We're continuing to work on large-scale human feedback and model evaluation. If you're working on related research and need large, high-quality annotations, feel free to get in touch: [email protected].

Check it out! ๐
Demo: onnx-community/model-explorer
Dataset: onnx-community/model-explorer
Source code: https://github.com/xenova/model-explorer

Thanks so much for working on that. It's really good.

So you are talking about a data set, but you can't prevent it. If you put a data set for military usage, then what's wrong with that? I mean if you don't put it, that means that some countries would have a better data set and take advantage of it. But if you open source it, that means there is certain competition in the military market and then they may think twice or create better data sets. If they put it or not, it really doesn't matter because we have to change the consciousness of this society to get peace in the world. So this is not like it's okay. I appreciate your intentions and that's actually the way to go. But forbidding people to put it, it's not going to forbid the world. With forbidding we don't change consciousness of people about the peace in the world. Do you get me?

@JLouisBiz I have read your comment but it does not make any sense , what is the purpose of regulations and law if there is no limitations to what one can do.
I didn't speak about law on that level, but of licensing. And that is same opinion and same notion I have, leave it to the law to decide.
You can't limit people by telling them to nicely read the license and follow the guidelines on what you think it is ethical or moral. Nice people anyway behave nice, bad people don't listen.
We are not talking about chair here or any ordinary object for innocent usage, we are talking about war tech developement, are a chair and war tools that made primary for destruction and killing the same !!!?
Yes, and? There are numerous books written on that subject and available from many libraries including online. Information is accessible.
Please think deeper on what I said in my first paragraph in this message.
They are not , probably if compared it with a gun it might be more comparable , you can have a gun to protect yourself but you need a permission for possessing a gun as us citizen or you get arrested for possessing unathorized item, you ask why , to unsure public safety, but even with such procedure there still horrifying incident that happens, we are not talking about something similar to chair here, the analogy is poorly representitive of the present situation.
Sure, but it is not related to LLM licensing and free software.
And it is not about opensource and free software it is about not developing war tools.
I speak only related to Free Software.
And yes, it can and IS USED to develop war tools. It is also used for criminal purposes.
As you said it in the first paragraph, it is for the law to decide. Not for author.
Because nice software or LLM author(s) simply cannot prevent any war by placing some kind of "warnings", like please "it is forbidden to make war by using this LLM as tool". That is nonsense. You cannot enforce it. You cannot even know.
Keep it fully free software so that there are no doubts on how it can be used. There is reason for the freedom.
Knives are sold everyday and there is no warning on how to use it.
Take an example someone using their computer for piracy , cyber threats , scams fraudelent activities etcs., do you let them just go their way or some actions need be made to protect people , war tech are far worse than any thing mentioned earlier.
You or me, we are not crime hunter, and if we are, we have got our ways.
Crime hunting is not by limiting people how to use some software programs. It is simply not feasible. That approach is rather prone to accuse unjustly those people with good intentions.
Remember, no matter how many permissions a citizen may need to carry a gun, the bad guy doesn't care about it, and is going to get it so much easier than the nice guy.

I don't agree on that. Any kind of such limitations is making the model proprietary and that means it's going to enter into less space and is going to help less people on this planet. The truly free software or open source model like LLM should not be limited because there is no limitation how a user is supposed to use it. Everybody should be able to use it how they wish and want. Another issue is there is no way for the author or anybody else to find out who used it in a way how you or author think it wasn't appropriate.
Let's just compare it to the everyday objects we use in our life. Let's say a chair and a keyboard, monitor or desk. When you are buying it or getting it for free from someone, do you get some kind of limitation? Please, this is the chair but you are not allowed to use it inappropriately or sit inappropriately or use the chair for other purposes but sitting. You don't.
So please think about that. People can definitely use chair to kill somebody or a desk to destroy things including using keyboard or monitor to fight.
And now how is somebody going to use their computer and software? Leave that for them because that's the point of free software. They can use it as they wish.
And just remember there is no way for anybody to control how somebody on the other side of the planet is using it.
This means that appealing to the ethical sense of honest and ethical people doesn't need to be there because they are already honest and ethical. And appealing to the sense of in-honest people is anyway not going to work. So you can't ensure of that anyway.
What is Free Software? - GNU Project - Free Software Foundation:
https://www.gnu.org/philosophy/free-sw.html
So read here and learn about the four free software freedoms.

I am using Nomic embed text and Nomic embed vision from the local API endpoint. In my opinion your package should be more flexible on how to generate embeddings because some people may use remote embeddings as well. What matters here much is that any kind of document can be ingested. Another question, did you maybe think of page numbers?

That sounds like the very needed thing. How can I use my own embedder?

What if I told you that you can do it within three to six lines of code?๐คฏ
Well, with my latest open-source project, ๐ข๐ง๐ ๐๐ฌ๐ญ-๐๐ง๐ฒ๐ญ๐ก๐ข๐ง๐ (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!๐
How? It's pretty simple!
๐ The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown)
๐ The PDF text is extracted using LlamaIndex readers
๐ฆ The text is chunked exploiting Chonkie
๐งฎ The chunks are embedded thanks to Sentence Transformers models
๐๏ธ The embeddings are loaded into a Qdrant vector database
And you're done!โ
Curious of trying it? Install it by running:
๐ฑ๐ช๐ฑ ๐ช๐ฏ๐ด๐ต๐ข๐ญ๐ญ ๐ช๐ฏ๐จ๐ฆ๐ด๐ต-๐ข๐ฏ๐บ๐ต๐ฉ๐ช๐ฏ๐จ
And you can start using it in your python scripts!๐
Don't forget to star it on GitHub and let me know if you have any feedback! โก๏ธ https://github.com/AstraBert/ingest-anything

An Android AI agent powered by open-source ML model, ๐ฑ๐ฒ๐ธ๐ถ, was fully open-sourced.
It understands whatโs on your screen and can perform tasks based on your voice or text commands.
Some examples:
* "Write my friend "some_name" in WhatsApp that I'll be 15 minutes late"
* "Open Twitter in the browser and write a post about something"
* "Read my latest notifications"
* "Write a linkedin post about something"
Currently, it works only on Android โ but support for other OS is planned.
The ML and backend codes were also fully open-sourced.
Video prompt example:
"Open linkedin, tap post and write: hi, it is deki, and now I am open sourced. But don't send, just return"
License: GPLv3
You can find other AI agent demos or usage examples, like, code generation or object detection in github.
Github: https://github.com/RasulOs/deki

What if I told you that you can do it within three to six lines of code?๐คฏ
Well, with my latest open-source project, ๐ข๐ง๐ ๐๐ฌ๐ญ-๐๐ง๐ฒ๐ญ๐ก๐ข๐ง๐ (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!๐
How? It's pretty simple!
๐ The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown)
๐ The PDF text is extracted using LlamaIndex readers
๐ฆ The text is chunked exploiting Chonkie
๐งฎ The chunks are embedded thanks to Sentence Transformers models
๐๏ธ The embeddings are loaded into a Qdrant vector database
And you're done!โ
Curious of trying it? Install it by running:
๐ฑ๐ช๐ฑ ๐ช๐ฏ๐ด๐ต๐ข๐ญ๐ญ ๐ช๐ฏ๐จ๐ฆ๐ด๐ต-๐ข๐ฏ๐บ๐ต๐ฉ๐ช๐ฏ๐จ
And you can start using it in your python scripts!๐
Don't forget to star it on GitHub and let me know if you have any feedback! โก๏ธ https://github.com/AstraBert/ingest-anything

ProCreations/Mistake-To-Meaning
Ollama? Takes more VRAM! It requires GGUF files, but they are anyway created by llama.cpp software, it is slower than llama.cpp, using models not published on ollama website requires user to think about it, configure, unlike llama.cpp
No way.

There is nothing to be proud of, you have based it on the proprietary model, disabling people to use it how they wish and want and totally disregarding free software principles. Why don't you take a good example from Microsoft IBM, Mistral or Allen AI, Qwen or DeepSeek companies which are distributing free software models?
Gemma License (danger) is not Free Software and is not Open Source
https://gnu.support/gnu-emacs/emacs-lisp/Gemma-License-danger-is-not-Free-Software-and-is-not-Open-Source.html
The Gemma Terms of Use and Prohibited Use Policy govern the use, modification, and distribution of Google's Gemma machine learning model and its derivatives. While Gemma is available for public use, it does not conform to Free Software or Open Source principles as defined by the Free Software Foundation (FSF) or Open Source Initiative (OSI). The terms impose significant restrictions, including prohibited use cases (e.g., illegal, harmful, or malicious activities), requirements to enforce Google's use restrictions on downstream users, and limitations on redistribution and derived works. Additionally, the terms do not guarantee access to source code or the freedom to use the software for any purpose, and they include broad disclaimers of warranty and liability. As a result, Gemma is a proprietary model with limited permissions, rather than a truly free or open-source software offering.
What is Free Software? - GNU Project - Free Software Foundation
https://www.gnu.org/philosophy/free-sw.html