gg-hf

Enterprise

AI & ML interests

None defined yet.

Recent Activity

sohamde authored a paper 10 days ago

Resurrecting Recurrent Neural Networks for Long Sequences

sohamde authored a paper 10 days ago

ConvNets Match Vision Transformers at Scale

sohamde authored a paper 10 days ago

Characterizing signal propagation to close the performance gap in unnormalized ResNets

View all activity

gg-hf's activity

philschmid

posted an update 20 days ago

Post

2435

Gemini 2.5 Flash is here! We excited launch our first hybrid reasoning Gemini model. In Flash 2.5 developer can turn thinking off.

**TL;DR:**
- 🧠 Controllable "Thinking" with thinking budget with up to 24k token
- 🌌 1 Million multimodal input context for text, image, video, audio, and pdf
- 🛠️ Function calling, structured output, google search & code execution.
- 🏦 $0.15 1M input tokens; $0.6 or $3.5 (thinking on) per million output tokens (thinking tokens are billed as output tokens)
- 💡 Knowledge cut of January 2025
- 🚀 Rate limits - Free 10 RPM 500 req/day
- 🏅Outperforms 2.0 Flash on every benchmark

Try it ⬇️
https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-flash-preview-04-17

1 reply

·

pcuenq

authored a paper about 1 month ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 180

lvwerra

authored a paper about 1 month ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 180

reach-vb

authored a paper about 1 month ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 180

loubnabnl

authored a paper about 1 month ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 180

RyanMullins

authored a paper about 1 month ago

ShieldGemma 2: Robust and Tractable Image Content Moderation

Paper • 2504.01081 • Published Apr 1 • 3

bebechien

authored a paper about 1 month ago

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 50

jethac

authored a paper about 1 month ago

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 50

PhilCulliton

authored a paper about 1 month ago

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 50

philschmid

authored a paper about 1 month ago

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 50

ddsh

authored a paper about 1 month ago

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 50

giffmana

authored a paper about 1 month ago

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 50

osanseviero

authored a paper about 1 month ago

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 50

RyanMullins

authored 3 papers about 1 month ago

Gemma: Open Models Based on Gemini Research and Technology

Paper • 2403.08295 • Published Mar 13, 2024 • 49

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 77

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 50

philschmid

posted an update about 1 month ago

Post

2916

Gemini 2.5 Pro, thinking by default! We excited launch our best Gemini model for reasoning, multimodal and coding yet! #1 on LMSYS, Humanity’s Last Exam, AIME and GPQA and more!

TL;DR:
- 💻 Best Gemini coding model yet, particularly for web development (excels on LiveCodeBench).
- 🧠 Default "Thinking" with up to 64k token output
- 🌌 1 Million multimodal input context for text, image, video, audio, and pdf
- 🛠️ Function calling, structured output, google search & code execution.
- 🏆 #1 on LMArena & sota on AIME, GPQA, Humanity's Last Exam
- 💡 Knowledge cut of January 2025
- 🤗 Available for free as Experimental in AI Studio, Gemini API & Gemini APP
- 🚀 Rate limits - Free 2 RPM 50 req/day

Try it ⬇️

https://aistudio.google.com/?model=gemini-2.5-pro-exp-03-25

3 replies

·

clefourrier

posted an update about 2 months ago

Post

2375

Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.

Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)

For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)

Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!

Because if your model knows its evals by heart, you're not testing for generalization.

alvarobartt

posted an update 2 months ago

Post

3101

🔥 Agents can do anything! @microsoft Research just announced the release of Magma 8B!

Magma is a new Visual Language Model (VLM) with 8B parameters for multi-modal agents designed to handle complex interactions across virtual and real environments; and it's MIT licensed!

Magma comes with exciting new features such as:
- Introduces the Set-of-Mark and Trace-of-Mark techniques for fine-tuning
- Leverages a large amount of unlabeled video data to learn the spatial-temporal grounding and planning
- A strong generalization and ability to be fine-tuned for other agentic tasks
- SOTA in different multi-modal benchmarks spanning across UI navigation, robotics manipulation, image / video understanding and spatial understanding and reasoning
- Generates goal-driven visual plans and actions for agentic use cases

Model: microsoft/Magma-8B
Technical Report: Magma: A Foundation Model for Multimodal AI Agents (2502.13130)

giffmana

authored a paper 3 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 144