Cรฉlina Hanouti's picture

Cรฉlina Hanouti PRO

celinah

AI & ML interests

inference, on-device and image generation

Recent Activity

Organizations

Hugging Face's profile picture Hugging Face OSS Metrics's profile picture Blog-explorers's profile picture Hugging Face for Computer Vision's profile picture MLX Community's profile picture Social Post Explorers's profile picture open/ acc's profile picture DDUF's profile picture hf-inference's profile picture Inference Endpoints Images's profile picture

celinah's activity

reacted to lysandre's post with โค๏ธ 2 months ago
view post
Post
6565
SmolVLM-2 and SigLIP-2 are now part of transformers in dedicated releases!

They're added on top of the v4.49.0 release, and can be installed from the following tags: v4.49.0-SmolVLM-2 and v4.49.0-SigLIP-2.

This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).

Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.

Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.
  • 1 reply
ยท
reacted to AdinaY's post with ๐Ÿ”ฅ 4 months ago
view post
Post
3205
MiniCPM-o2.6 ๐Ÿ”ฅ an end-side multimodal LLMs released by OpenBMB from the Chinese community
Model: openbmb/MiniCPM-o-2_6
โœจ Real-time English/Chinese conversation, emotion control and ASR/STT
โœจ Real-time video/audio understanding
โœจ Processes up to 1.8M pixels, leads OCRBench & supports 30+ languages
reacted to merve's post with โค๏ธ 4 months ago
view post
Post
3693
What a beginning to this year in open ML ๐Ÿค 
Let's unwrap! merve/jan-10-releases-677fe34177759de0edfc9714

Multimodal ๐Ÿ–ผ๏ธ
> ByteDance released SA2VA: a family of vision LMs that can take image, video, text and visual prompts
> moondream2 is out with new capabilities like outputting structured data and gaze detection!
> Dataset: Alibaba DAMO lab released multimodal textbook โ€” 22k hours worth of samples from instruction videos ๐Ÿคฏ
> Dataset: SciCap captioning on scientific documents benchmark dataset is released along with the challenge!

LLMs ๐Ÿ’ฌ
> Microsoft released Phi-4, sota open-source 14B language model ๐Ÿ”ฅ
> Dolphin is back with Dolphin 3.0 Llama 3.1 8B ๐Ÿฌ๐Ÿฌ
> Prime-RL released Eurus-2-7B-PRIME a new language model trained using PRIME alignment
> SmallThinker-3B is a new small reasoning LM based on Owen2.5-3B-Instruct ๐Ÿ’ญ
> Dataset: QWQ-LONGCOT-500K is the dataset used to train SmallThinker, generated using QwQ-32B-preview ๐Ÿ“•
> Dataset: @cfahlgren1 released React Code Instructions: a dataset of code instruction-code pairs ๐Ÿ“•
> Dataset: Qwen team is on the roll, they just released CodeElo, a dataset of code preferences ๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป

Embeddings ๐Ÿ”–
> @MoritzLaurer released zero-shot version of ModernBERT large ๐Ÿ‘
> KaLM is a new family of performant multilingual embedding models with MIT license built using Qwen2-0.5B

Image/Video Generation โฏ๏ธ
> NVIDIA released Cosmos, a new family of diffusion/autoregressive World Foundation Models generating worlds from images, videos and texts ๐Ÿ”ฅ
> Adobe released TransPixar: a new text-to-video model that can generate assets with transparent backgrounds (a first!)
> Dataset: fal released cosmos-openvid-1m Cosmos-tokenized OpenVid-1M with samples from OpenVid-1M

Others
> Prior Labs released TabPFNv2, the best tabular transformer is out for classification and regression
> Metagene-1 is a new RNA language model that can be used for pathogen detection, zero-shot embedding and genome understanding
reacted to cfahlgren1's post with โค๏ธ 4 months ago
view post
Post
2257
You'll notice the AI in the SQL Console is much better at working with chatml conversations:

Here's example of unnesting the cfahlgren1/react-code-instructions in less than 10 seconds by asking it. Check it out here: cfahlgren1/react-code-instructions

- "show me the average assistant response length"
- "extract user, system, and assistant messages into separate columns"

It's super easy to work with conversational datasets now with natural language ๐Ÿ—ฃ๏ธ





  • 2 replies
ยท
reacted to clem's post with โค๏ธ 4 months ago
view post
Post
4392
Cool to see @ylecun joining the top 10 of most followed on HF!

(and leaderboard by @mvaloatto is here: mvaloatto/TCTF)
  • 2 replies
ยท
reacted to m-ric's post with ๐Ÿ”ฅ 5 months ago
view post
Post
2419
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: ๐—ช๐—ฒ๐—น๐—ฐ๐—ผ๐—บ๐—ฒ ๐— ๐—ผ๐—ฑ๐—ฒ๐—ฟ๐—ป๐—•๐—˜๐—ฅ๐—ง! ๐Ÿค—

We talk a lot about โœจGenerative AIโœจ, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models.

The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs).

It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs.

Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub.

โžก๏ธ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT.

๐—ง๐—Ÿ;๐——๐—ฅ:
๐Ÿ›๏ธ Architecture changes:
โ‡’ First, standard modernizations:
- Rotary positional embeddings (RoPE)
- Replace GeLU with GeGLU,
- Use Flash Attention 2
โœจ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead.

๐Ÿฅ‡ As a result, the model tops the game of encoder models:
It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster!

Read the blog post ๐Ÿ‘‰ https://huggingface.co/blog/modernbert
  • 1 reply
ยท
reacted to FranckAbgrall's post with ๐Ÿ”ฅ 5 months ago
view post
Post
1275
๐Ÿ†• It should now be easier to identify discussions or pull requests where repository owners are participating on HF, let us know it that helps ๐Ÿ’ฌ๐Ÿค—
  • 1 reply
ยท
posted an update 5 months ago
view post
Post
759
๐Ÿš€ We've just dropped a new release v0.27.0 of the ๐š‘๐šž๐š๐š๐š’๐š—๐š๐š๐šŠ๐šŒ๐šŽ_๐š‘๐šž๐š‹ Python library!

This release includes:
- ๐Ÿ’พ New torch model loading utilities in the serialization module โ€” providing a standardized way to save and load torch models with built-in support for sharding and safe serialization.
- ๐Ÿ“ฆ Tooling for something exciting โ€” if you like single-file formats for models like GGUF, you'll love what we're cooking up ๐Ÿ‘€ More coming soon!
- ๐Ÿ› ๏ธ Loads of quality-of-life improvements and bug fixes!

release notes and full details here ๐Ÿ‘‡
Wauplin/huggingface_hub#10

$ pip install -U huggingface_hub
reacted to burtenshaw's post with โค๏ธ๐Ÿ”ฅ 5 months ago
view post
Post
2529
Quick update from week 1 of smol course. The community is taking the driving seat and using the material for their own projects. If you want to do the same, join in!

- we have ongoing translation projects in Korean, Vietnamese, Portuguese, and Spanish
- 3 chapters are ready for students. On topics like, instruction tuning, preference alignment, and parameter efficient fine tuning
- 3 chapters are in progress on evaluation, vision language models, and synthetic data.
- around 780 people have forked the repo to use it for learning, teaching, sharing.

โญ๏ธ Next step is to support people that want to use the course for teaching, content creation, internal knowledge sharing, or anything. If you're into this. Drop an issue or PR

REPO: https://buff.ly/3ZCMKX2
discord channel: https://buff.ly/4f9F8jA
reacted to julien-c's post with ๐Ÿค—โค๏ธ๐Ÿ”ฅ 5 months ago
view post
Post
10685
After some heated discussion ๐Ÿ”ฅ, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community ๐Ÿ”ฅ

cc: @reach-vb @pierric @victor and the HF team
ยท
reacted to merve's post with โค๏ธ 5 months ago
view post
Post
5664
This week in open-source AI was insane ๐Ÿค  A small recap๐Ÿ•บ๐Ÿป merve/dec-6-releases-67545caebe9fc4776faac0a3

Multimodal ๐Ÿ–ผ๏ธ
> Google shipped a PaliGemma 2, new iteration of PaliGemma with more sizes: 3B, 10B and 28B, with pre-trained and captioning variants ๐Ÿ‘
> OpenGVLab released InternVL2, seven new vision LMs in different sizes, with sota checkpoint with MIT license โœจ
> Qwen team at Alibaba released the base models of Qwen2VL models with 2B, 7B and 72B ckpts

LLMs ๐Ÿ’ฌ
> Meta released a new iteration of Llama 70B, Llama3.2-70B trained further
> EuroLLM-9B-Instruct is a new multilingual LLM for European languages with Apache 2.0 license ๐Ÿ”ฅ
> Dataset: CohereForAI released GlobalMMLU, multilingual version of MMLU with 42 languages with Apache 2.0 license
> Dataset: QwQ-LongCoT-130K is a new dataset to train reasoning models
> Dataset: FineWeb2 just landed with multilinguality update! ๐Ÿ”ฅ nearly 8TB pretraining data in many languages!

Image/Video Generation ๐Ÿ–ผ๏ธ
> Tencent released HunyuanVideo, a new photorealistic video generation model
> OminiControl is a new editing/control framework for image generation models like Flux

Audio ๐Ÿ”Š
> Indic-Parler-TTS is a new text2speech model made by community
reacted to burtenshaw's post with โค๏ธ 5 months ago
view post
Post
2811
For anyone looking to boost their LLM fine-tuning and alignment skills this decemeber. We're running this free and open course called smol course. Itโ€™s not big like Li Yin and @mlabonne , itโ€™s just smol.

๐Ÿ‘ท It focuses on practical use cases, so if youโ€™re working on something, bring it along.

๐Ÿ‘ฏโ€โ™€๏ธ Itโ€™s peer reviewed and open so you can discuss and get feedback.

๐Ÿค˜ If youโ€™re already a smol pro, feel free to drop a star or issue.

> > Part 1 starts now, and itโ€™s on instruction tuning!

https://github.com/huggingface/smol-course
reacted to m-ric's post with ๐Ÿš€๐Ÿ”ฅ 5 months ago
view post
Post
1307
๐Ÿค– ๐—”๐—ฑ๐—ผ๐—ฏ๐—ฒ'๐˜€ ๐—ฐ๐—ผ๐—ฑ๐—ฒ-๐—ด๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐—ฎ๐—ด๐—ฒ๐—ป๐˜ ๐—ฟ๐—ฒ๐—ฎ๐—ฐ๐—ต๐—ฒ๐˜€ ๐˜๐—ต๐—ฒ ๐˜๐—ผ๐—ฝ ๐—ผ๐—ณ ๐—š๐—”๐—œ๐—” ๐—น๐—ฒ๐—ฎ๐—ฑ๐—ฒ๐—ฟ๐—ฏ๐—ผ๐—ฎ๐—ฟ๐—ฑ - and their paper cites my work!

๐Ÿ’ก Reminder:ย In short, Agentic systems are a vehicle in which you put your LLM to allow it access to the outside world.

โžก๏ธ The team of researchers at Adobe started from the idea that current agentic systems lack the ability to define their own tools. So they decided to make an agent that writes actions as code, thus allowing it to write python functions that can be re-used later as tools!

Here's what the LLM generations can look like with the proper prompt:

Thought: I need to access the excel file using a different method.
Action:
def access_excel_file(file_path)
	... # rest of the code (the agent does writes it, but I don't have room in this post)
	return rows


Then your system executes this and appends the observation to the agent's memory.

Why is this code formulation better than classical tool use formulation as JSON? The paper explains:

"Most existing work uses text or JSON as the representation of actions, which significantly lacks the two criteria mentioned earlier: generality and composability. In contrast, DynaSaur can utilize available actions or create new ones if necessary, using code as a unified representation. In principle, acting with code enables agents to solve any Turing-complete problem."

The idea of using code is not new: in fact, we do it in transformers.agents (thus the citation that I got). They implementation adds further refinements, like using RAG to retrieve relevant functions before generating an action, which increases performance further.

And they observe that code agents perform much better, reaching the top of GAIA leaderboard! ๐Ÿฅ‡

Go take a look, it's really clear and informative!

Paper added to my agents collection ๐Ÿ‘‰ m-ric/agents-65ba776fbd9e29f771c07d4e
reacted to andito's post with ๐Ÿ”ฅ 5 months ago
view post
Post
1971
SmolVLM speeding locally on a laptop thanks to mlx-vlm and
@Gradio ! Try it with two lines:
pip install git+https://github.com/andimarafioti/mlx-vlm.git@stream-generate-fix
python -m mlx_vlm.chat_ui --model mlx-community/SmolVLM-Instruct-8bit

Gotta love the MLX community! Big thanks to @pcuenq and @prince_canuma !
reacted to jsulz's post with ๐Ÿš€ 6 months ago
view post
Post
2132
In August, the XetHub team joined Hugging Face
- https://huggingface.co/blog/xethub-joins-hf - and weโ€™ve been rolling up our sleeves to bring the best of both worlds together. We started with a deep dive into the current state of files stored with Git LFS on the Hub.

Getting this information was no small feat. We had to:
* Analyze a complete database dump of all repositories and files stored in Git LFS across Hugging Face.
* Parse through metadata on file sizes and types to accurately map the storage breakdown across Spaces, Models, and Datasets.

You can read more about the findings (with some jaw-dropping stats + charts) here https://www.linkedin.com/feed/update/urn:li:activity:7244486280351285248
reacted to merve's post with ๐Ÿš€ 6 months ago