LLM Reasoning & Agentic RL
Verifiable reasoning, agentic reinforcement learning, policy optimization, path pruning, code-integrated thinking, and multimodal R1-style training.
This project organizes papers where reasoning is trained, optimized, pruned, or grounded in executable feedback. The scope covers policy optimization, role-aware agent RL, adaptive compute, code-integrated thinking, and multimodal R1-style learning.
Research Storyline
OnePO and QFFT study how to adapt models more directly, reducing dependence on heavy multi-stage pipelines while preserving exploration.
CRPO turns GRPO-style optimization toward role-playing agents, balancing task utility with persona fidelity and style consistency.
STOP learns to prune doomed reasoning paths early, making parallel reasoning more accurate under a fixed compute budget.
CoRT and Video-R1 extend reasoning feedback to executable computation and temporal multimodal understanding.
Paper Trail
Directly optimizes policies for domain adaptation without a separate SFT stage.
Publication listAdapts group-relative policy optimization to role-aware reasoning agents.
PaperStudies efficient and adaptive reasoning fine-tuning for large language models.
PaperLearns internal signals that prune low-value reasoning paths early during parallel reasoning.
PaperLets models use executable computation as part of the reasoning process.
PaperApplies R1-style reinforcement learning to temporal video reasoning in multimodal LLMs.
PaperProject Clusters
OnePO, CRPO, and QFFT organize RL-style post-training around direct optimization, role fidelity, and efficient adaptation.
STOP and UPFT focus on spending less compute without losing reasoning quality, especially for long or parallel reasoning traces.
CoRT connects reasoning to code execution, making intermediate calculations easier to verify and debug.
Video-R1 extends reinforcement learning for reasoning beyond text-only math into temporal video understanding.
Resource Map
Project page for early pruning in efficient parallel reasoning.
Project pageCode and datasets for R1-style video reasoning.
RepositoryCode-integrated reasoning resources.
Repository