LLM Efficiency and AI Infrastructure

Efficient training, inference, retrieval, data, and multimodal context infrastructure for open LLM builders.

Efficiency and infrastructure
Long context Efficient reasoning RAG data Token reduction Open infrastructure
LongLLaVA architecture for long-context multimodal models

This project gathers the infrastructure work that makes open models more usable: longer multimodal context, cheaper visual tokens, stronger retrieval data, editable memory, efficient fine-tuning, and reasoning systems that spend computation where it matters.

Background and Motivation

Capability is constrained by infrastructure

Better models are not enough if training data, retrieval pipelines, memory, long context, and inference cost do not scale with real tasks.

Multimodal inputs are expensive

Images, videos, and long documents quickly overload context windows, making token selection and long-context design central to deployment.

Reasoning needs adaptive compute

Efficient reasoning systems should prune weak paths, tune with minimal supervision, and update knowledge without expensive retraining.

Core Ideas

Context
Scale multimodal context length

LongLLaVA and MileBench study how models reason over many images and long multimodal inputs without losing global structure.

Tokens
Spend visual tokens only where they help

TRIM studies token reduction for multimodal LLMs, reducing cost while preserving reasoning-critical visual information.

Data
Build instruction and retrieval infrastructure

RAG-Instruct, LLMZoo, and related resources package data, models, and tasks so downstream builders can reproduce and extend systems.

Update
Make memory and reasoning editable

E2-RAG, prefix tuning, question-free fine-tuning, and early path pruning all aim to update or accelerate systems without rebuilding everything.

Typical Work

Long
LongLLaVA and MileBench

Long-context multimodal model and benchmark work for reasoning over many images, videos, and long visual contexts.

Project page
Efficient
TRIM: Less is More for Efficient Multi-modal LLMs

Reduces visual tokens so multimodal inference is cheaper while retaining task-relevant evidence.

Paper
RAG
E2-RAG and RAG-Instruct

Studies editable efficient retrieval and data infrastructure for retrieval-augmented instruction following.

Project page
Reason
Question-Free Fine-Tuning and path pruning

Improves adaptive reasoning with less supervision and by pruning low-value reasoning branches early.

QFFT

Display Figures

Resource Map

LongLLaVA and MileBench

Long-context multimodal project page covering models, benchmarks, and visual context scaling.

Project page
RAG and Instruction Data

Data and retrieval infrastructure for instruction tuning and knowledge-grounded model building.

Project page
LLMZoo

Open model and training resource collection that supports the lab's broader multilingual and infrastructure stack.

Repository

Why It Matters

  • Open AI progress depends on infrastructure that smaller teams can run, inspect, and extend.
  • Long-context and multimodal tasks need cost-aware model design, not only larger context windows.
  • Editable retrieval and efficient adaptation make models more practical in domains where knowledge changes quickly.