JLouisBiz (Jean Louis)

replied to onekq's post 5 days ago

Gemini's proprietary license is a deal-breaker. It's not just about performance—it's about freedom. Google's terms actively restrict libre use, while models like QwQ 32B and DeepSeek v3 (when properly licensed) respect user rights. Never conflate ethically-licensed AI with corporate traps that forbid modification, redistribution, or independent use.

reacted to as-cle-bert's post with 👍 5 days ago

Post

1809

One of the biggest challenges I've been facing since I started developing [𝐏𝐝𝐟𝐈𝐭𝐃𝐨𝐰𝐧](https://github.com/AstraBert/PdfItDown) was handling correctly the conversion of files like Excel sheets and CSVs: table conversion was bad and messy, almost unusable for downstream tasks🫣

That's why today I'm excited to introduce 𝐫𝐞𝐚𝐝𝐞𝐫𝐬, the new feature of PdfItDown v1.4.0!🎉

With 𝘳𝘦𝘢𝘥𝘦𝘳𝘴, you can choose among three (for now👀) flavors of text extraction and conversion to PDF:

- 𝗗𝗼𝗰𝗹𝗶𝗻𝗴, which does a fantastic work with presentations, spreadsheets and word documents🦆

- 𝗟𝗹𝗮𝗺𝗮𝗣𝗮𝗿𝘀𝗲 by LlamaIndex, suitable for more complex and articulated documents, with mixture of texts, images and tables🦙

- 𝗠𝗮𝗿𝗸𝗜𝘁𝗗𝗼𝘄𝗻 by Microsoft, not the best at handling highly structured documents, by extremly flexible in terms of input file format (it can even convert XML, JSON and ZIP files!)✒️

You can use this new feature in your python scripts (check the attached code snippet!😉) and in the command line interface as well!🐍

Have fun and don't forget to star the repo on GitHub ➡️ https://github.com/AstraBert/PdfItDown

reacted to fdaudens's post with 👍 5 days ago

Post

2889

Forget everything you know about transcription models - NVIDIA's parakeet-tdt-0.6b-v2 changed the game for me!

Just tested it with Steve Jobs' Stanford speech and was speechless (pun intended). The video isn’t sped up.

3 things that floored me:
- Transcription took just 10 seconds for a 15-min file
- Got a CSV with perfect timestamps, punctuation & capitalization
- Stunning accuracy (correctly captured "Reed College" and other specifics)

NVIDIA also released a demo where you can click any transcribed segment to play it instantly.

The improvement is significant: number 1 on the ASR Leaderboard, 6% error rate (best in class) with complete commercial freedom (cc-by-4.0 license).

Time to update those Whisper pipelines! H/t @Steveeeeeeen for the finding!

Model: nvidia/parakeet-tdt-0.6b-v2
Demo: nvidia/parakeet-tdt-0.6b-v2
ASR Leaderboard: hf-audio/open_asr_leaderboard

1 reply

·

reacted to AdinaY's post with 👍 7 days ago

Post

2800

DeepSeek, Alibaba, Skywork, Xiaomi, Bytedance.....
And that’s just part of the companies from the Chinese community that released open models in April 🤯

zh-ai-community/april-2025-open-releases-from-the-chinese-community-67ea699965f6e4c135cab10f

🎬 Video
> MAGI-1 by SandAI
> SkyReels-A2 & SkyReels-V2 by Skywork
> Wan2.1-FLF2V by Alibaba-Wan

🎨 Image
> HiDream-I1 by Vivago AI
> Kimi-VL by Moonshot AI
> InstantCharacter by InstantX & Tencent-Hunyuan
> Step1X-Edit by StepFun
> EasyControl by Shanghai Jiaotong University

🧠 Reasoning
> MiMo by Xiaomi
> Skywork-R1V 2.0 by Skywork
> ChatTS by ByteDance
> Kimina by Moonshot AI & Numina
> GLM-Z1 by Zhipu AI
> Skywork OR1 by Skywork
> Kimi-VL-Thinking by Moonshot AI

🔊 Audio
> Kimi-Audio by Moonshot AI
> IndexTTS by BiliBili
> MegaTTS3 by ByteDance
> Dolphin by DataOceanAI

🔢 Math
> DeepSeek Prover V2 by Deepseek

🌍 LLM
> Qwen by Alibaba-Qwen
> InternVL3 by Shanghai AI lab
> Ernie4.5 (demo) by Baidu

📊 Dataset
> PHYBench by Eureka-Lab
> ChildMandarin & Seniortalk by BAAI

Please feel free to add if I missed anything!

reacted to ZennyKenny's post with 👍 8 days ago

Post

2706

I've created a new dataset using the Algorithm of Thoughts architecture proposed by Sel et al. (2023) in a reasoning context. (paper: https://arxiv.org/pdf/2308.10379)

The dataset simulates the discovery phase of a fictitious VC firm called Reasoned Capital and, once expanded, can be used to create models which are able to make complex, subjective financial decisions based on different criteria.

The generation process encourages recursive problem-solving in increasingly complex prompts to encourage models to assess and reevaluate the conclusions and generated opinions of upstream models. Pretty neat stuff, and I'm not aware of this architecture being used in a reasoning context anywhere else.

Check it out: ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset

reacted to AdinaY's post with 🔥 9 days ago

Post

5076

Kimi-Audio 🚀🎧 an OPEN audio foundation model released by Moonshot AI
moonshotai/Kimi-Audio-7B-Instruct
✨ 7B
✨ 13M+ hours of pretraining data
✨ Novel hybrid input architecture
✨ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)

reacted to jasoncorkill's post with 🔥 9 days ago

Post

5485

🚀 Building Better Evaluations: 32K Image Annotations Now Available

Today, we're releasing an expanded version: 32K images annotated with 3.7M responses from over 300K individuals which was completed in under two weeks using the Rapidata Python API.

Rapidata/text-2-image-Rich-Human-Feedback-32k

A few months ago, we published one of our most liked dataset with 13K images based on the @data-is-better-together 's dataset, following Google's research on "Rich Human Feedback for Text-to-Image Generation" (https://arxiv.org/abs/2312.10240). It collected over 1.5M responses from 150K+ participants.

Rapidata/text-2-image-Rich-Human-Feedback

In the examples below, users highlighted words from prompts that were not correctly depicted in the generated images. Higher word scores indicate more frequent issues. If an image captured the prompt accurately, users could select [No_mistakes].

We're continuing to work on large-scale human feedback and model evaluation. If you're working on related research and need large, high-quality annotations, feel free to get in touch: [email protected].

reacted to Xenova's post with 🔥 9 days ago

Post

5326

Introducing the ONNX model explorer: Browse, search, and visualize neural networks directly in your browser. 🤯 A great tool for anyone studying Machine Learning! We're also releasing the entire dataset of graphs so you can use them in your own projects! 🤗

Check it out! 👇
Demo: onnx-community/model-explorer
Dataset: onnx-community/model-explorer
Source code: https://github.com/xenova/model-explorer

replied to as-cle-bert's post 9 days ago

Thanks so much for working on that. It's really good.

replied to hassenhamdi's post 9 days ago

So you are talking about a data set, but you can't prevent it. If you put a data set for military usage, then what's wrong with that? I mean if you don't put it, that means that some countries would have a better data set and take advantage of it. But if you open source it, that means there is certain competition in the military market and then they may think twice or create better data sets. If they put it or not, it really doesn't matter because we have to change the consciousness of this society to get peace in the world. So this is not like it's okay. I appreciate your intentions and that's actually the way to go. But forbidding people to put it, it's not going to forbid the world. With forbidding we don't change consciousness of people about the peace in the world. Do you get me?

replied to hassenhamdi's post 10 days ago

@JLouisBiz I have read your comment but it does not make any sense , what is the purpose of regulations and law if there is no limitations to what one can do.

I didn't speak about law on that level, but of licensing. And that is same opinion and same notion I have, leave it to the law to decide.

You can't limit people by telling them to nicely read the license and follow the guidelines on what you think it is ethical or moral. Nice people anyway behave nice, bad people don't listen.

We are not talking about chair here or any ordinary object for innocent usage, we are talking about war tech developement, are a chair and war tools that made primary for destruction and killing the same !!!?

Yes, and? There are numerous books written on that subject and available from many libraries including online. Information is accessible.

Please think deeper on what I said in my first paragraph in this message.

They are not , probably if compared it with a gun it might be more comparable , you can have a gun to protect yourself but you need a permission for possessing a gun as us citizen or you get arrested for possessing unathorized item, you ask why , to unsure public safety, but even with such procedure there still horrifying incident that happens, we are not talking about something similar to chair here, the analogy is poorly representitive of the present situation.

Sure, but it is not related to LLM licensing and free software.

And it is not about opensource and free software it is about not developing war tools.

I speak only related to Free Software.

And yes, it can and IS USED to develop war tools. It is also used for criminal purposes.

As you said it in the first paragraph, it is for the law to decide. Not for author.

Because nice software or LLM author(s) simply cannot prevent any war by placing some kind of "warnings", like please "it is forbidden to make war by using this LLM as tool". That is nonsense. You cannot enforce it. You cannot even know.

Keep it fully free software so that there are no doubts on how it can be used. There is reason for the freedom.

Knives are sold everyday and there is no warning on how to use it.

Take an example someone using their computer for piracy , cyber threats , scams fraudelent activities etcs., do you let them just go their way or some actions need be made to protect people , war tech are far worse than any thing mentioned earlier.

You or me, we are not crime hunter, and if we are, we have got our ways.

Crime hunting is not by limiting people how to use some software programs. It is simply not feasible. That approach is rather prone to accuse unjustly those people with good intentions.

Remember, no matter how many permissions a citizen may need to carry a gun, the bad guy doesn't care about it, and is going to get it so much easier than the nice guy.

replied to hassenhamdi's post 11 days ago

I don't agree on that. Any kind of such limitations is making the model proprietary and that means it's going to enter into less space and is going to help less people on this planet. The truly free software or open source model like LLM should not be limited because there is no limitation how a user is supposed to use it. Everybody should be able to use it how they wish and want. Another issue is there is no way for the author or anybody else to find out who used it in a way how you or author think it wasn't appropriate.

Let's just compare it to the everyday objects we use in our life. Let's say a chair and a keyboard, monitor or desk. When you are buying it or getting it for free from someone, do you get some kind of limitation? Please, this is the chair but you are not allowed to use it inappropriately or sit inappropriately or use the chair for other purposes but sitting. You don't.

So please think about that. People can definitely use chair to kill somebody or a desk to destroy things including using keyboard or monitor to fight.

And now how is somebody going to use their computer and software? Leave that for them because that's the point of free software. They can use it as they wish.

And just remember there is no way for anybody to control how somebody on the other side of the planet is using it.

This means that appealing to the ethical sense of honest and ethical people doesn't need to be there because they are already honest and ethical. And appealing to the sense of in-honest people is anyway not going to work. So you can't ensure of that anyway.

What is Free Software? - GNU Project - Free Software Foundation:
https://www.gnu.org/philosophy/free-sw.html

So read here and learn about the four free software freedoms.

replied to as-cle-bert's post 11 days ago

I am using Nomic embed text and Nomic embed vision from the local API endpoint. In my opinion your package should be more flexible on how to generate embeddings because some people may use remote embeddings as well. What matters here much is that any kind of document can be ingested. Another question, did you maybe think of page numbers?

replied to as-cle-bert's post 12 days ago

That sounds like the very needed thing. How can I use my own embedder?

reacted to as-cle-bert's post with 🔥 12 days ago

Post

2883

Ever dreamt of ingesting into a vector DB that pile of CSVs, Word documents and presentations laying in some remote folders on your PC?🗂️
What if I told you that you can do it within three to six lines of code?🤯
Well, with my latest open-source project, 𝐢𝐧𝐠𝐞𝐬𝐭-𝐚𝐧𝐲𝐭𝐡𝐢𝐧𝐠 (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!🚀
How? It's pretty simple!
📁 The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown)
📑 The PDF text is extracted using LlamaIndex readers
🦛 The text is chunked exploiting Chonkie
🧮 The chunks are embedded thanks to Sentence Transformers models
🗄️ The embeddings are loaded into a Qdrant vector database

And you're done!✅
Curious of trying it? Install it by running:

𝘱𝘪𝘱 𝘪𝘯𝘴𝘵𝘢𝘭𝘭 𝘪𝘯𝘨𝘦𝘴𝘵-𝘢𝘯𝘺𝘵𝘩𝘪𝘯𝘨

And you can start using it in your python scripts!🐍
Don't forget to star it on GitHub and let me know if you have any feedback! ➡️ https://github.com/AstraBert/ingest-anything

5 replies

·

reacted to orasul's post with 🔥 12 days ago

Post

2127

hi, it is deki, and now I am open sourced.

An Android AI agent powered by open-source ML model, 𝗱𝗲𝗸𝗶, was fully open-sourced.

It understands what’s on your screen and can perform tasks based on your voice or text commands.

Some examples:
* "Write my friend "some_name" in WhatsApp that I'll be 15 minutes late"
* "Open Twitter in the browser and write a post about something"
* "Read my latest notifications"
* "Write a linkedin post about something"

Currently, it works only on Android — but support for other OS is planned.

The ML and backend codes were also fully open-sourced.

Video prompt example:

"Open linkedin, tap post and write: hi, it is deki, and now I am open sourced. But don't send, just return"

License: GPLv3

You can find other AI agent demos or usage examples, like, code generation or object detection in github.

Github: https://github.com/RasulOs/deki

2 replies

·

reacted to as-cle-bert's post with 🤗 12 days ago

Post

2883

Ever dreamt of ingesting into a vector DB that pile of CSVs, Word documents and presentations laying in some remote folders on your PC?🗂️
What if I told you that you can do it within three to six lines of code?🤯
Well, with my latest open-source project, 𝐢𝐧𝐠𝐞𝐬𝐭-𝐚𝐧𝐲𝐭𝐡𝐢𝐧𝐠 (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!🚀
How? It's pretty simple!
📁 The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown)
📑 The PDF text is extracted using LlamaIndex readers
🦛 The text is chunked exploiting Chonkie
🧮 The chunks are embedded thanks to Sentence Transformers models
🗄️ The embeddings are loaded into a Qdrant vector database

And you're done!✅
Curious of trying it? Install it by running:

𝘱𝘪𝘱 𝘪𝘯𝘴𝘵𝘢𝘭𝘭 𝘪𝘯𝘨𝘦𝘴𝘵-𝘢𝘯𝘺𝘵𝘩𝘪𝘯𝘨

And you can start using it in your python scripts!🐍
Don't forget to star it on GitHub and let me know if you have any feedback! ➡️ https://github.com/AstraBert/ingest-anything

5 replies

·

reacted to ProCreations's post with 🚀 14 days ago

Post

2117

Come check out my new dataset Mistake to Meaning as an attempt to help smaller models understand user typos better! Hope you guys enjoy it

ProCreations/Mistake-To-Meaning

replied to onekq's post 14 days ago

Ollama? Takes more VRAM! It requires GGUF files, but they are anyway created by llama.cpp software, it is slower than llama.cpp, using models not published on ollama website requires user to think about it, configure, unlike llama.cpp

No way.

replied to hannayukhymenko's post 14 days ago

There is nothing to be proud of, you have based it on the proprietary model, disabling people to use it how they wish and want and totally disregarding free software principles. Why don't you take a good example from Microsoft IBM, Mistral or Allen AI, Qwen or DeepSeek companies which are distributing free software models?

Gemma License (danger) is not Free Software and is not Open Source
https://gnu.support/gnu-emacs/emacs-lisp/Gemma-License-danger-is-not-Free-Software-and-is-not-Open-Source.html

The Gemma Terms of Use and Prohibited Use Policy govern the use, modification, and distribution of Google's Gemma machine learning model and its derivatives. While Gemma is available for public use, it does not conform to Free Software or Open Source principles as defined by the Free Software Foundation (FSF) or Open Source Initiative (OSI). The terms impose significant restrictions, including prohibited use cases (e.g., illegal, harmful, or malicious activities), requirements to enforce Google's use restrictions on downstream users, and limitations on redistribution and derived works. Additionally, the terms do not guarantee access to source code or the freedom to use the software for any purpose, and they include broad disclaimers of warranty and liability. As a result, Gemma is a proprietary model with limited permissions, rather than a truly free or open-source software offering.

What is Free Software? - GNU Project - Free Software Foundation
https://www.gnu.org/philosophy/free-sw.html

Jean Louis PRO

AI & ML interests

Recent Activity

Organizations

JLouisBiz's activity