Projects | Freedom AI

Reasoning, RL, and adaptive test-time compute

LLM Reasoning & Agentic RL

This project organizes the lab's recent work on verifiable reasoning, policy optimization, path pruning, code-integrated thinking, and multimodal R1-style training. The through-line is simple: make LLMs reason with feedback signals that are inspectable, efficient, and useful for downstream agents.

Open LLM Reasoning & Agentic RL Open Math and Optimization Papers from Benyou Wang

LLM reasoning and agentic reinforcement learning project signal

Paper organization

OnePO: Direct One-stage Policy Optimization for SFT-free Domain Adaptation - direct policy optimization without a separate SFT stage.
CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents - RL objectives for role-aware reasoning agents.
Question-Free Fine-Tuning - efficient and adaptive reasoning fine-tuning.
Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning - learnable path pruning for large reasoning models.
Video-R1: Reinforcing Video Reasoning in MLLMs - R1-style reinforcement learning for multimodal video reasoning.
CoRT: Code-integrated Reasoning within Thinking - executable computation inside the reasoning process.

Project stackReasoning papers, code, and datasets

ProjectSTOP / Cut Your LossesEarly path pruning for efficient parallel reasoning. GitHubVideo-R1Video-R1-CoT and RL training resources for video reasoning. GitHubCoRTCode-integrated reasoning resources. GitHubUPFTUnsupervised prefix fine-tuning for efficient reasoning models.

Agents, simulators, tools, and applied environments

LLM Agents and Applications

This project groups papers where LLMs become agents: tool planners, user simulators, standardized patients, role-playing agents, market participants, and micro-world actors. The goal is to organize agent papers by what the agent does, what environment it acts in, and how the interaction is evaluated.

Open LLM Agents and Applications Open Human-Agent Interaction Papers from Benyou Wang

Paper organization

Smurfs: Multi-Agent System using Context-Efficient DFSDT for Tool Planning - multi-agent tool planning with context-efficient search.
Large Language Model as a User Simulator - LLM users for dialogue training and evaluation.
PlatoLM: Teaching LLMs via a Socratic Questioning User Simulator - Socratic interaction as a training signal.
Human or LLM as Standardized Patients? - AI patients for medical education and evaluation.
TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets - LLM investor agents in market simulation.
MicroVerse - agentic micro-world simulation for scientific processes.

Project stackAgent applications and environments

PaperSmurfsMulti-agent tool planning through context-efficient DFSDT. GitHubEasyMEDAI standardized patient framework and evaluation resources. ProjectTwinMarketFinancial market simulation with LLM agents. ProjectMicroVerseMicro-world simulation with hidden mechanisms and evolving states.

Human-agent interaction and simulation

Human-Agent Interaction

这条线关注智能体如何和真实用户、学习者、患者、市场参与者以及模拟世界互动。它覆盖 LLM user simulator、AI standardized patients、speech-to-speech human-likeness evaluation、MicroVerse 交互式科学仿真，以及 TwinMarket 这类多智能体社会/金融模拟。

Open Human-Agent Interaction Open Economic World Models Open MicroVerse

多模态大模型

多模态大模型方向把文字、图像、视频、音频和医学视觉放进同一个能力地图：从 LongLLaVA/MileBench 的长上下文视觉理解，到 TRIM 的视觉 token 压缩、ShareGPT-4o-Image/Janus-4o 的开放图像生成，再到 Video-R1、HuatuoGPT-Vision 和 FusionAudio 这类面向推理、医疗和音频场景的模型与数据。

Open Multimodal LLMs Open LongLLaVA and MileBench Open ShareGPT-4o and Janus-4o