TuringPost (Turing Post)

Kseniase

posted an update 4 days ago

Post

3954

10 new Chain-of-Thoughts (CoT) methods

CoT has long been one of the hottest techniques in AI thanks to its effectiveness and compelling core idea: encouraging models to solve complex problems through explicit intermediate reasoning steps. But usually researchers modify original CoT approach, finding tips that further improve LLMs' reasoning. That's what we're going to talk about today.

Here's a list of 10 latest enhanced CoT approaches:

1. Chain-of-Defensive-Thought -> Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption (2504.20769)
Provides a few structured, defensive reasoning exemplars to improve the robustness of LLMs

2. Hybrid-CoT -> AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization (2504.21659)
Proposes using Adaptive Hybrid Reasoning Model (AdaR1) that combines Long- and Short-CoT, and applying bi-level preference training to select effective reasoning styles

3. Semantic-level and token-level CoT -> T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT (2505.00703)
Introduces T2I-R1 text-to-image gen model, that uses semantic-level CoT for prompt planning and token-level CoT for pixel-level generation, while BiCoT-GRPO coordinates them both

4. Speculative CoT (SCoT) -> Efficient Reasoning for LLMs through Speculative Chain-of-Thought (2504.19095)
SCoT drafts multiple reasoning paths with a lightweight draft, selects the best, and uses the target model for correction - all this to reduce latency by 48–66%

5. Collaborative CoT (Co-CoT) -> Co-CoT: A Prompt-Based Framework for Collaborative Chain-of-Thought Reasoning (2504.17091)
Breaks reasoning into blocks that users can inspect, modify and re-run, promoting active engagement. An adaptation mechanism aligns outputs with diverse cognitive styles and user goals

6. XS-CoT -> Enhancing Non-Core Language Instruction-Following in Speech LLMs via Semi-Implicit Cross-Lingual CoT Reasoning (2504.20835)
It's a cross-lingual framework that integrates speech-to-text translation into reasoning, using a semi-implicit CoT approach to compress intermediate tokens. This improves non-core language responses by up to 45%

Read further in the comments 👇

If you liked this, also subscribe to the Turing Post -> https://www.turingpost.com/subscribe

1 reply

·

Kseniase

posted an update 11 days ago

Post

6410

6 Free resources on Reinforcement Learning (RL)

RL now is where the real action is, it's the engine behind autonomous tech, robots, and the next wave of AI that thinks, moves and solves problems on its own. To stay up to date with what’s happening in RL, we offer some fresh materials on it:

1. "Reinforcement Learning from Human Feedback" by Nathan Lambert -> https://rlhfbook.com/
It's a short introduction to RLHF, explaining instruction tuning, reward modeling, alignment methods, synthetic data, evaluation, and more

2. "A Course in Reinforcement Learning (2nd Edition)" by Dimitri P. Bertsekas -> https://www.mit.edu/~dimitrib/RLbook.html
Explains dynamic programming (DP) and RL, diving into rollout algorithms, neural networks, policy learning, etc. It’s packed with solved exercises and real-world examples

3. "Mathematical Foundations of Reinforcement Learning" video course by Shiyu Zhao -> https://www.youtube.com/playlist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8
Offers a mathematical yet friendly introduction to RL, covering Bellman Equation, value iteration, Monte Carlo learning, approximation, policy gradient, actor-critic methods, etc.
+ Check out the repo for more: https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning

4. "Multi-Agent Reinforcement Learning" by Stefano V. Albrecht, Filippos Christianos, and Lukas Schäfer -> https://www.marl-book.com/
Covers models, core ideas of multi-agent RL (MARL) and modern approaches to combining it with deep learning

5. "Reinforcement Learning: A Comprehensive Overview" by Kevin P. Murphy -> https://arxiv.org/pdf/2412.05265
Explains RL and sequential decision making, covering value-based, policy-gradient, model-based, multi-agent RL methods, RL+LLMs, and RL+inference and other topics

6. Our collection of free courses and books on RL -> https://huggingface.co/posts/Kseniase/884818121094439

If you liked this, also subscribe to The Turing Post: https://www.turingpost.com/subscribe

Kseniase

posted an update 18 days ago

Post

7015

11 new types of RAG

RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs.

Here are 11 latest RAG types:

1. InstructRAG -> InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning (2504.13032)
Combines RAG with a multi-agent framework, using a graph-based structure, an RL agent to expand task coverage, and a meta-learning agent for better generalization

2. CoRAG (Collaborative RAG) -> CoRAG: Collaborative Retrieval-Augmented Generation (2504.01883)
A collaborative framework that extends RAG to settings where clients train a shared model using a joint passage store

3. ReaRAG -> ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation (2503.21729)
It uses a Thought-Action-Observation loop to decide at each step whether to retrieve information or finalize an answer, reducing unnecessary reasoning and errors

4. MCTS-RAG -> MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search (2503.20757)
Combines RAG with Monte Carlo Tree Search (MCTS) to help small LMs handle complex, knowledge-heavy tasks

5. Typed-RAG - > Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering (2503.15879)
Improves answers on open-ended questions by identifying question types (a debate, personal experience, or comparison) and breaking it down into simpler parts

6. MADAM-RAG -> Retrieval-Augmented Generation with Conflicting Evidence (2504.13079)
A multi-agent system where models debate answers over multiple rounds and an aggregator filters noise and misinformation

7. HM-RAG -> HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation (2504.12330)
A hierarchical multi-agent RAG framework that uses 3 agents: one to split queries, one to retrieve across multiple data types (text, graphs and web), and one to merge and refine answers

8. CDF-RAG -> CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation (2504.12560)
Works with causal graphs and enables multi-hop causal reasoning, refining queries. It validates responses against causal pathways

To explore what is Causal AI, read our article: https://www.turingpost.com/p/causalai

Subscribe to the Turing Post: https://www.turingpost.com/subscribe

Read further 👇

1 reply

·

Kseniase

posted an update 25 days ago

Post

5529

16 new research on inference-time scaling:

For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models.

So here are 13 new methods + 3 comprehensive studies on test-time scaling:

1. Inference-Time Scaling for Generalist Reward Modeling (2504.02495)
Probably, the most popular study. It proposes to boost inference-time scalability by improving reward modeling. To enhance performance, DeepSeek-GRM uses adaptive critiques, parallel sampling, pointwise generative RM, and Self-Principled Critique Tuning (SPCT)

2. T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models (2504.04718)
Allows small models to use external tools, like code interpreters and calculator, to enhance self-verification

3. Z1: Efficient Test-time Scaling with Code (2504.00810)
Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window

4. GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning (2504.00891)
Introduces GenPRM, a generative PRM, that uses CoT reasoning and code verification for step-by-step judgment. With only 23K training examples, GenPRM outperforms prior PRMs and larger models

5. Can Test-Time Scaling Improve World Foundation Model? (2503.24320)
SWIFT test-time scaling framework improves World Models' performance without retraining, using strategies like fast tokenization, Top-K pruning, and efficient beam search

6. Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking (2504.07104)
Proposes REBEL for RAG systems scaling, which uses multi-criteria optimization with CoT prompting for better performance-speed tradeoffs as inference compute increases

7. $φ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation (2503.13288)
Proposes a φ-Decoding strategy that uses foresight sampling, clustering and adaptive pruning to estimate and select optimal reasoning steps

Read further below 👇

Also, subscribe to the Turing Post https://www.turingpost.com/subscribe

2 replies

·

Kseniase

updated a Space about 1 month ago

README

🚀

Kseniase

posted an update about 1 month ago

Post

2694

9 Types of AI inference

AI inference refers to the process when AI models generate predictions, classifications, or decisions based on input data and pre-trained models. It encompasses a wide range of approaches with different computational methods and deployment.

Firstly, here are 5 inference types, based on how the model reasons:

1. Probabilistic inference -> https://arxiv.org/pdf/2502.05244
Uses probability theory to reason under uncertainty. The system maintains degrees of belief over hypotheses and updates them as evidence comes in.

2. Rule-based inference -> Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference (2407.00075)
Draws conclusions by applying explicit if-then rules encoded in a knowledge base. Mostly used in neurosymbolic AI.

3. Logical inference -> https://arxiv.org/abs/2009.03393
Uses formal logic to draw conclusions that are guaranteed true if the premises are. It supports theorem proving, logic programming, and tasks needing correctness, like software verification.

4. Abductive inference -> Can ChatGPT Make Explanatory Inferences? Benchmarks for Abductive Reasoning (2404.18982)
Involves forming hypotheses that would best explain a given set of observations - among multiple possible explanations, the goal is to choose the most plausible. Abduction is inherently creative and uncertain.

5. Fuzzy inference -> DCNFIS: Deep Convolutional Neuro-Fuzzy Inference System (2308.06378)
Applies fuzzy logic – reasoning with degrees of truth rather than binary true/false. Inputs are mapped to fuzzy sets with membership grades between 0 and 1.

Secondly, here are 4 inference types based on its execution contexts:

1. Batch inference -> BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching (2412.03594)
Involves generating model predictions on large sets of data in bulk, often on a scheduled basis or as needed for analysis rather than immediate use.

2. Real-time inference -> Real-time Inference and Extrapolation via a Diffusion-inspired Temporal Transformer Operator (DiTTO) (2307.09072)
Produces outputs on-demand with minimal latency, so results are available immediately when needed.

Read further in the comments 👇

2 replies

·

Kseniase

posted an update about 1 month ago

Post

2011

9 Multimodal Chain-of-Thought methods

How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer.

Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source:

1. KAM-CoT -> KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning (2401.12863)
This lightweight framework combines CoT prompting with knowledge graphs (KGs) and achieves 93.87% accuracy

2. Multimodal Visualization-of-Thought (MVoT) -> Imagine while Reasoning in Space: Multimodal Visualization-of-Thought (2501.07542)
Lets models generate visual reasoning traces, using a token discrepancy loss to improve visual quality

3. Compositional CoT (CCoT) -> Compositional Chain-of-Thought Prompting for Large Multimodal Models (2311.17076)
Uses scene graph (SG) representations generated by the LMM itself to improve performance on compositional and general multimodal benchmarks

4. URSA -> URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics (2501.04686)
Brings System 2-style thinking to multimodal math reasoning, using a 3-module CoT data synthesis process with CoT distillation, trajectory-format rewriting and format unification

5. MM-Verify -> MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification (2502.13383)
Introduces a verification mechanism with MM-Verifier and MM-Reasoner that implements synthesized high-quality CoT data for multimodal reasoning

6. Duty-Distinct CoT (DDCoT) -> DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models (2310.16436)
Divides the reasoning responsibilities between LMs and visual models, integrating the visual recognition capabilities into the joint reasoning process

7. Multimodal-CoT from Amazon Web Services -> Multimodal Chain-of-Thought Reasoning in Language Models (2302.00923)
A two-stage framework separates rationale generation from answer prediction, allowing the model to reason more effectively using multimodal inputs

8. Graph-of-Thought (GoT) -> Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models (2305.16582)
This two-stage framework models reasoning as a graph of interconnected ideas, improving performance on text-only and multimodal tasks

More in the comments👇

1 reply

·

Kseniase

posted an update about 2 months ago

Post

5129

8 types of RoPE

As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on.

Here are 8 types of RoPE that can be implemented in different cases:

1. Original RoPE -> RoFormer: Enhanced Transformer with Rotary Position Embedding (2104.09864)
Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info.

2. LongRoPE -> LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens (2402.13753)
Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search.

3. LongRoPE2 -> LongRoPE2: Near-Lossless LLM Context Window Scaling (2502.20082)
Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity.

4. Multimodal RoPE (MRoPE) -> Qwen2.5-VL Technical Report (2502.13923)
Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos.

5. Directional RoPE (DRoPE) -> DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling (2503.15029)
Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage.

6. VideoRoPE -> VideoRoPE: What Makes for Good Video Rotary Position Embedding? (2502.05173)
Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing.

7. VRoPE -> VRoPE: Rotary Position Embedding for Video Large Language Models (2502.11664)
An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus.

8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10
Introduces an exponential decay factor into the rotation matrix, improving stability on long sequences.

1 reply

·

Kseniase

posted an update about 2 months ago

Post

7867

15 types of attention mechanisms

Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.

Here is a list of 15 types of attention mechanisms used in AI models:

1. Soft attention (Deterministic attention) -> Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1.

2. Hard attention (Stochastic attention) -> Effective Approaches to Attention-based Neural Machine Translation (1508.04025)
Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything.

3. Self-attention -> Attention Is All You Need (1706.03762)
Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.

4. Cross-Attention (Encoder-Decoder attention) -> Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation (2104.08771)
The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources.

5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762)
Multiple attention “heads” are run in parallel. The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.

6. Multi-Head Latent Attention (MLA) -> DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (2405.04434)
Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations.

7. Memory-Based attention -> End-To-End Memory Networks (1503.08895)
Involves an external memory and uses attention to read from and write to this memory.

See other types in the comments 👇

1 reply

·

Kseniase

posted an update about 2 months ago

Post

4157

5 New implementations of Diffusion Models

Diffusion models are widely used for image and video generation but remain underexplored in text generation, where autoregressive models (ARMs) dominate. Unlike ARMs, which produce tokens sequentially, diffusion models iteratively refine noise through denoising steps, offering greater flexibility and speed.
Recent advancements show a shift toward using diffusion models in place of, or alongside, ARMs. Researchers also combine strengths from both methods and integrate autoregressive concepts into diffusion.

Here are 5 new implementations of diffusion models:

1. Mercury family of diffusion LLMs (dLLMs) by Inception Labs -> https://www.inceptionlabs.ai/news
It applies diffusion to text and code data, enabling sequence generation 10x faster than today's top LLMs. Now available Mercury Coder can run at over 1,000 tokens/sec on NVIDIA H100s.

2. Diffusion of Thoughts (DoT) -> Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models (2402.07754)
Integrates diffusion models with Chain-of-Thought. DoT allows reasoning steps to diffuse gradually over time. This flexibility enables balancing between reasoning quality and computational cost.

3. LLaDA -> Large Language Diffusion Models (2502.09992)
Shows diffusion models' potential in replacing ARMs. Trained with pre-training and SFT, LLaDA masks tokens, predicts them via a Transformer, and optimizes a likelihood bound. LLaDA matches key LLM skills, and surpasses GPT-4o in reversal poetry.

4. LanDiff -> The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation (2503.04606)
This hybrid text-to-video model combines autoregressive and diffusion paradigms, introducing a semantic tokenizer, an LM for token generation, and a streaming diffusion model. LanDiff outperforms models like Sora.

5. General Interpolating Discrete Diffusion (GIDD) -> Generalized Interpolating Discrete Diffusion (2503.04482)
A flexible noising process with a novel diffusion ELBO enables combining masking and uniform noise, allowing diffusion models to correct mistakes, where ARMs struggle.

3 replies

·

Kseniase

posted an update 2 months ago

Post

6205

9 types of "Chain-of-..." approaches:

Chain-of-Thought (CoT) prompting enhances reasoning in AI models by breaking down complex problems into step-by-step logical sequences. It continues proving its effectiveness, especially in top-performing reasoning models. However, there are other similar methods, that expand CoT and can be used for different purposes. Here are 9 of them:

1. Chain-of-Action-Thought (COAT) -> Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search (2502.02508)
Helps model decide when to keep thinking, double-check their work, or try a different approach, using special guiding tokens.

2. Chain of Draft (CoD) -> Chain of Draft: Thinking Faster by Writing Less (2502.18600)
It helps model generate short but meaningful reasoning steps, cutting costs and making processing faster

3. Chain-of-Agents -> Chain of Agents: Large Language Models Collaborating on Long-Context Tasks (2406.02818)
Uses multi-agent collaboration: Worker agents process text parts in a structured chain, and manager agent summarizes the results

4. Chain-of-RAG ->https://huggingface.co/papers/2501.14342
Creates retrieval chains, instead of retrieving all info at once. It can dynamically adjust its search process and its parameters like step number

5. Chain-of-Shot Prompting (CoS) -> CoS: Chain-of-Shot Prompting for Long Video Understanding (2502.06428)
Helps models pick frames crucial for understanding a video, using a binary video summary and video co-reasoning module.

6. Chain of Hindsight (CoH) -> Chain of Hindsight Aligns Language Models with Feedback (2302.02676)
Converts all feedback into sequences to fine-tune the model and refine outputs

7. Chain-of-Note (CoN) -> Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models (2311.09210)
Generates sequential reading notes for each retrieved document to assess relevance before integrating info into the final answer

8. Chain of Diagnosis (CoD) -> CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis (2407.13301)
Transforms the diagnostic process into a diagnostic chain

9. Chain(s)-of-Knowledge -> https://www.turingpost.com/p/cok
Enhance LLMs by dynamically pulling in external knowledge to improve accuracy and reduce errors

Kseniase

posted an update 2 months ago

Post

9730

8 Free Sources about AI Agents:

Agents seem to be everywhere and this collection is for a deep dive into the theory and practice:

1. "Agents" Google's whitepaper by Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic -> https://www.kaggle.com/whitepaper-agents
Covers agents, their functions, tool use and how they differ from models

2. "Agents in the Long Game of AI. Computational Cognitive Modeling for Trustworthy, Hybrid AI" book by Marjorie McShane, Sergei Nirenburg, and Jesse English -> https://direct.mit.edu/books/oa-monograph/5833/Agents-in-the-Long-Game-of-AIComputational
Explores building AI agents, using Hybrid AI, that combines ML with knowledge-based reasoning

3. "AI Engineer Summit 2025: Agent Engineering" 8-hour video -> https://www.youtube.com/watch?v=D7BzTxVVMuw
Experts' talks that share insights on the freshest Agent Engineering advancements, such as Google Deep Research, scaling tips and more

4. AI Agents Course from Hugging Face -> https://huggingface.co/learn/agents-course/en/unit0/introduction
Agents' theory and practice to learn how to build them using top libraries and tools

5. "Artificial Intelligence: Foundations of Computational Agents", 3rd Edition, book by David L. Poole and Alan K. Mackworth -> https://artint.info/3e/html/ArtInt3e.html
Agents' architectures, how they learn, reason, plan and act with certainty and uncertainty

6. "Intelligent Agents: Theory and Practice" book by Michael Wooldridge -> https://www.cs.ox.ac.uk/people/michael.wooldridge/pubs/ker95/ker95-html.html
A fascinating option to dive into how agents were seen in 1995 and explore their theory, architectures and agent languages

7. The Turing Post articles "AI Agents and Agentic Workflows" on Hugging Face -> @Kseniase
We explore agentic workflows in detail and agents' building blocks, such as memory and knowledge

8. Our collection "8 Free Sources to Master Building AI Agents" -> https://www.turingpost.com/p/building-ai-agents-sources

5 replies

·

Kseniase

posted an update 3 months ago

Post

3300

8 New Applications of Test-Time Scaling

We've noticed a huge interest in test-time scaling (TTS), so we decided to explore this concept further. Test-time compute (TTC) refers to the amount of computational power used by an AI model when generating a response. Many researchers are now focused on scaling TTC, as it enables slow, deep "thinking" and step-by-step reasoning, which improves overall models' performance.

Here are 8 fresh studies on test-time scaling:

1. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (2502.05171)
Introduces an LM that scales TTC by reasoning in latent space instead of generating more tokens with no special training. Here, a recurrent block to processes information iteratively.

2. Generating Symbolic World Models via Test-time Scaling of Large Language Models (2502.04728)
Shows how TTS is applied to enhance model's Planning Domain Definition Language (PDDL) reasoning capabilities, which can be used to generate a symbolic world model.

3. Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling (2502.06703)
Analyzes optimal TTS strategies and shows how small models can outperform much larger ones.

4. Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis (2502.04128)
Shows how TTS improves expressiveness, timbre consistency and accuracy in speech synthesis with Llasa framework. It also dives into benefits of scaling train-time compute.

5. Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning (2502.07154)
Suggests a modified training loss for better reasoning of LLMs when scaling TTC.

6. Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures (2502.05078)
Unifies the strengths of chain, tree, and graph paradigms into one framework that expands reasoning only on necessary subproblems.

7. Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification (2502.01839)
Explores scaling trends of self-verification and how to improve its capabilities with TTC.

8. CodeMonkeys: Scaling Test-Time Compute for Software Engineering (2501.14723)
Explores how scaling serial compute (iterations) and parallel compute (trajectories), can improve accuracy in real-world software engineering issues.

Also, explore our article about TTS for more -> https://huggingface.co/blog/Kseniase/testtimecompute

1 reply

·

Kseniase

posted an update 3 months ago

Post

7893

8 New Types of RAG

RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.

Here's a list of 8 latest RAG advancements:

1. DeepRAG -> DeepRAG: Thinking to Retrieval Step by Step for Large Language Models (2502.01142)
Models retrieval-augmented reasoning as a Markov Decision Process, enabling strategic retrieval. It dynamically decides when to retrieve external knowledge and when rely on parametric reasoning.

2. RealRAG -> RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning (2502.00848)
Enhances novel object generation by retrieving real-world images and using self-reflective contrastive learning to fill knowledge gap, improve realism and reduce distortions.

3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342)
Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.

4. VideoRAG -> VideoRAG: Retrieval-Augmented Generation over Video Corpus (2501.05874)
Enables unlimited-length video processing, using dual-channel architecture that integrates graph-based textual grounding and multi-modal context encoding.

5. CFT-RAG -> CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter (2501.15098)
A tree-RAG acceleration method uses an improved Cuckoo Filter to optimize entity localization, enabling faster retrieval.

6. Contextualized Graph RAG (CG-RAG) -> CG-RAG: Research Question Answering by Citation Graph Retrieval-Augmented LLMs (2501.15067)
Uses Lexical-Semantic Graph Retrieval (LeSeGR) to integrate sparse and dense signals within graph structure and capture citation relationships

7. GFM-RAG -> GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation (2502.01113)
A graph foundation model that uses a graph neural network to refine query-knowledge connections

8. URAG -> URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots -- A Case Study at HCMUT (2501.16276)
A hybrid system combining rule-based and RAG methods to improve lightweight LLMs for educational chatbots

1 reply

·

Kseniase

posted an update 3 months ago

Post

5011

8 Free Sources on Reinforcement Learning

With the phenomenon of DeepSeek-R1's top reasoning capabilities, we all saw the true power of RL. At its core, RL is a type of machine learning where a model/agent learns to make decisions by interacting with an environment to maximize a reward. RL learns through trial and error, receiving feedback in the form of rewards or penalties.

Here's a list of free sources that will help you dive into RL and how to use it:

1. "Reinforcement Learning: An Introduction" book by Richard S. Sutton and Andrew G. Barto -> https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf

2. Hugging Face Deep Reinforcement Learning Course -> https://huggingface.co/learn/deep-rl-course/unit0/introduction
You'll learn how to train agents in unique environments, using best libraries, share your results, compete in challenges, and earn a certificate.

3. OpenAI Spinning Up in Deep RL -> https://spinningup.openai.com/en/latest/index.html
A comprehensive overview of RL with many useful resources

4. "Reinforcement Learning and Optimal Control" books, video lectures and course material by Dimitri P. Bertsekas from ASU -> https://web.mit.edu/dimitrib/www/RLbook.html
Explores approximate Dynamic Programming (DP) and RL with key concepts and methods like rollout, tree search, and neural network training for RL and more.

5. RL Course by David Silver (Google DeepMind) -> https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PLqYmG7hTraZDM-OYHWgPeb
Many recommend these video lectures as a good foundation

6. RL theory seminars -> https://sites.google.com/view/rltheoryseminars/home?authuser=0
Provides virtual seminars from different experts about RL advancements

7. "Reinforcement Learning Specialization" (a 4-course series on Coursera) -> https://www.coursera.org/learn/fundament

8. Concepts: RLHF, RLAIF, RLEF, RLCF -> https://www.turingpost.com/p/rl-f
Our flashcards easily explain what are these four RL approaches with different feedback

Kseniase

posted an update 3 months ago

Post

3127

7 Open-source Methods to Improve Video Generation and Understanding

AI community is making great strides toward achieving the full potential of multimodality in video generation and understanding. Last week studies showed that working with videos is now one of the main focuses for improving AI models. Another highlight of the week is that open source, once again, proves its value. For those who were impressed by DeepSeek-R1, we’re with you!

Today, we’re combining these two key focuses and bringing you a list of open-source methods for better video generation and understanding:

1. VideoLLaMA 3 model: Excels in various video and image tasks thanks to vision-centric training approach. VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding (2501.13106)

2. FILMAGENT framework assigns roles to multiple AI agents, like a director, screenwriter, actor, and cinematographer, to automate the filmmaking process in 3D virtual environments. FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces (2501.12909)

3. Improving Video Generation with Human Feedback (2501.13918) proposes a new VideoReward Model and approach that uses human feedback to refine video generation models.

4. DiffuEraser video inpainting model, based on stable diffusion, is designed to fill in missing areas with detailed, realistic content and to ensure consistent structures across frames. DiffuEraser: A Diffusion Model for Video Inpainting (2501.10018)

5. MAGI is a hybrid video gen model that combines masked and casual modeling. Its key innovation, Complete Teacher Forcing (CTF), conditions masked frames on fully visible frames. Taming Teacher Forcing for Masked Autoregressive Video Generation (2501.12389)

6. Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise (2501.08331) proposes motion control, allowing users to guide how objects or the camera move in generated videos. Its noise warping algorithm replaces random noise in videos with structured noise based on motion info.

7. Video Depth Anything model estimates depth consistently in super-long videos (several minutes or more) without sacrificing quality or speed. Video Depth Anything: Consistent Depth Estimation for Super-Long Videos (2501.12375)

Kseniase

posted an update 4 months ago

Post

2075

10 Recent Advancements in Math Reasoning

Over the last few weeks, we have witnessed a surge in AI models' math reasoning capabilities. Top companies like Microsoft, NVIDIA, and Alibaba Qwen have already joined this race to make models "smarter" in mathematics. But why is this shift happening now?

Complex math calculations require advanced multi-step reasoning, making mathematics an ideal domain for demonstrating a model's strong "thinking" capabilities. Additionally, as AI continues to evolve and is applied in math-intensive fields such as machine learning and quantum computing (which is predicted to see significant growth in 2025), it must meet the demands of complex reasoning.
Moreover, AI models can be integrated with external tools like symbolic solvers or computational engines to tackle large-scale math problems, which also needs high-quality math reasoning.

So here’s a list of 10 recent advancements in math reasoning of AI models:

1. NVIDIA: AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling (2412.15084)

2. Qwen, Alibaba: Qwen2.5-Math-PRM The Lessons of Developing Process Reward Models in Mathematical Reasoning (2501.07301) and PROCESSBENCH evaluation ProcessBench: Identifying Process Errors in Mathematical Reasoning (2412.06559)

3. Microsoft Research: rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking (2501.04519)

4. BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning (2501.03226)

5. URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics (2501.04686)

6. U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs (2412.03205)

7. Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs (2501.06430)

8. End-to-End Bangla AI for Solving Math Olympiad Problem Benchmark: Leveraging Large Language Model Using Integrated Approach (2501.04425)

9. Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning (2501.03035)

10. System-2 Mathematical Reasoning via Enriched Instruction Tuning (2412.16964)

Turing Post

AI & ML interests

Recent Activity

Articles

FOD#93: When AI meant Ambient Intelligence

🎙️🧩 TP/Inference: Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI

🌁#86: Four Freedoms of truly open AI

TuringPost's activity

README

AI & ML interests

Recent Activity

Articles

FOD#93: When AI meant Ambient Intelligence

🎙️🧩 TP/Inference: Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI

🌁#86: Four Freedoms of truly open AI

Team members 3

TuringPost's activity

README