I trained a Language Model to schedule events with GRPO!
By
•
•
57CircleGuardBench: New Standard for Evaluating AI Moderation Models
By
and 7 others
•
•
45Introducing HalluMix: A Task-Agnostic, Multi-Domain Benchmark for Detecting Hallucinations in Real-World Scenarios
By
and 3 others
•
•
18🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?
By
•
•
236Creating your custom Ghibli Text-to-Image model
By
and 3 others
•
•
15Uncensor any LLM with abliteration
By
•
•
545Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs
By
and 1 other
•
•
10DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge
By
•
•
128AI Personas: The Impact of Design Choices
By
and 1 other
•
•
8ColPali: Efficient Document Retrieval with Vision Language Models 👀
By
•
•
245Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time
By
and 4 others
•
•
32DeepWiki: Best AI Documentation Generator for Any Github Repo
By
•
•
15Building Multimodal RAG Systems: Supercharging Retrieval with MultiModal Embeddings and LLMs
By
•
•
6Introduction to State Space Models (SSM)
By
•
•
128A Guide to Running Qwen 3 Locally with Ollama and vLLM
By
•
•
7Reduce, Reuse, Recycle: Why Open Source is a Win for Sustainability
By
and 1 other
•
•
5Code a simple RAG from scratch
By
•
•
66KV Caching Explained: Optimizing Transformer Inference Efficiency
By
•
•
62What is test-time compute and how to scale it?
By
and 1 other
•
•
83Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment
By
•
•
27