LLM Efficiency and AI Infrastructure

Efficiency and infrastructure

Long context Efficient reasoning RAG data Token reduction Open infrastructure

LongLLaVA and MileBench RAG and Data TRIM All Projects

LongLLaVA architecture for long-context multimodal models

This project gathers the infrastructure work that makes open models more usable: longer multimodal context, cheaper visual tokens, stronger retrieval data, editable memory, efficient fine-tuning, and reasoning systems that spend computation where it matters.

Background and Motivation

Capability is constrained by infrastructure

Better models are not enough if training data, retrieval pipelines, memory, long context, and inference cost do not scale with real tasks.

Multimodal inputs are expensive

Images, videos, and long documents quickly overload context windows, making token selection and long-context design central to deployment.

Reasoning needs adaptive compute

Efficient reasoning systems should prune weak paths, tune with minimal supervision, and update knowledge without expensive retraining.

Core Ideas

Context

Scale multimodal context length

LongLLaVA and MileBench study how models reason over many images and long multimodal inputs without losing global structure.

Tokens

Spend visual tokens only where they help

TRIM studies token reduction for multimodal LLMs, reducing cost while preserving reasoning-critical visual information.

Data

Build instruction and retrieval infrastructure

RAG-Instruct, LLMZoo, and related resources package data, models, and tasks so downstream builders can reproduce and extend systems.

Update

Make memory and reasoning editable

E2-RAG, prefix tuning, question-free fine-tuning, and early path pruning all aim to update or accelerate systems without rebuilding everything.

Typical Work

Long

LongLLaVA and MileBench

Long-context multimodal model and benchmark work for reasoning over many images, videos, and long visual contexts.

Project page

Efficient

TRIM: Less is More for Efficient Multi-modal LLMs

Reduces visual tokens so multimodal inference is cheaper while retaining task-relevant evidence.

Paper

RAG

E2-RAG and RAG-Instruct

Studies editable efficient retrieval and data infrastructure for retrieval-augmented instruction following.

Project page

Reason

Question-Free Fine-Tuning and path pruning

Improves adaptive reasoning with less supervision and by pruning low-value reasoning branches early.

QFFT

Display Figures

LongLLaVA architecture — Long-context multimodal infrastructure lets models reason across many images rather than isolated visual snippets.

LongLLaVA training data and pipeline — Efficient training pipelines connect data design, context scaling, and downstream evaluation.

Open-source repositories and model resources — Open repositories, datasets, and model cards turn infrastructure work into reusable building blocks.

Resource Map

LongLLaVA and MileBench

Long-context multimodal project page covering models, benchmarks, and visual context scaling.

Project page

RAG and Instruction Data

Data and retrieval infrastructure for instruction tuning and knowledge-grounded model building.

Project page

LLMZoo

Open model and training resource collection that supports the lab's broader multilingual and infrastructure stack.

Repository

Why It Matters

Open AI progress depends on infrastructure that smaller teams can run, inspect, and extend.
Long-context and multimodal tasks need cost-aware model design, not only larger context windows.
Editable retrieval and efficient adaptation make models more practical in domains where knowledge changes quickly.

Back to Projects