I trained a Language Model to schedule events with GRPO!
By
•
•
59CircleGuardBench: New Standard for Evaluating AI Moderation Models
By
and 7 others
•
•
46Introducing HalluMix: A Task-Agnostic, Multi-Domain Benchmark for Detecting Hallucinations in Real-World Scenarios
By
and 3 others
•
•
18🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?
By
•
•
237Creating your custom Ghibli Text-to-Image model
By
and 3 others
•
•
15Uncensor any LLM with abliteration
By
•
•
546AI Personas: The Impact of Design Choices
By
and 1 other
•
•
10Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs
By
and 1 other
•
•
10DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge
By
•
•
129Reduce, Reuse, Recycle: Why Open Source is a Win for Sustainability
By
and 1 other
•
•
7ColPali: Efficient Document Retrieval with Vision Language Models 👀
By
•
•
245DeepWiki: Best AI Documentation Generator for Any Github Repo
By
•
•
15Building Multimodal RAG Systems: Supercharging Retrieval with MultiModal Embeddings and LLMs
By
•
•
6Introduction to State Space Models (SSM)
By
•
•
128KV Caching Explained: Optimizing Transformer Inference Efficiency
By
•
•
63Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time
By
and 4 others
•
•
32Code a simple RAG from scratch
By
•
•
66What is test-time compute and how to scale it?
By
and 1 other
•
•
83Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment
By
•
•
27What is MoE 2.0? Update Your Knowledge about Mixture-of-experts
By
and 1 other
•
•
6