ShareGPT-4o-Image and Janus-4o

Image generation alignment

ShareGPT-4o-Image Janus-4o Text-to-image Image editing Multimodal alignment

ShareGPT-4o-Image distills GPT-4o-style image generation interactions into an open dataset, and Janus-4o turns that data into a practical multimodal model for text-to-image and text-plus-image-to-image generation.

Research Storyline

Data

Open the interaction format

ShareGPT-4o-Image captures text-to-image and image-conditioned editing instructions so researchers can study generation behavior with open data.

Model

Train a unified multimodal generator

Janus-4o uses the data to support image understanding and generation in one compact model family.

Edit

Move beyond one-shot generation

Image-conditioned generation and editing matter because real users usually refine, transform, and localize existing visual content.

Reuse

Make visual alignment reproducible

The dataset, model, and paper give the community an open reference point for GPT-4o-like image generation interactions.

What The Project Contributes

Open generation data

The dataset includes text-to-image and image-conditioned generation examples, making GPT-4o-like generation behavior easier to study and reproduce.

Unified multimodal model

Janus-4o adapts a unified multimodal architecture so image understanding and image generation can live in one model family.

Practical image editing

The data and model support not only generation from text but also edits and transformations conditioned on existing images.

Display Figures

Open multimodal generation resources — The project follows the lab's open-release pattern: dataset, model, repository, and paper linked as one reproducible package.

Multimodal model architecture context — ShareGPT-4o-Image and Janus-4o sit alongside the lab's broader multimodal line, from understanding to generation.

Paper Trail

Dataset

ShareGPT-4o-Image

Open image generation and editing instruction data for studying GPT-4o-style multimodal generation behavior.

Dataset

Model

Janus-4o-7B

A unified multimodal model checkpoint for text-to-image and image-conditioned generation.

Model

Context

LongLLaVA and multimodal infrastructure

Connects the generation project to the lab's larger multimodal program on visual context, data quality, and open model behavior.

Long-context multimodal AI

Why It Matters

Open image generation research needs high-quality instruction data, not just model weights.
Image editing is an important multimodal interaction pattern because users often refine existing visual content rather than generate from scratch.
The project gives the community a compact way to study GPT-4o-level image generation behavior in open models.

Resource Map

ShareGPT-4o-Image dataset

Open image generation and editing instruction data.

Dataset

Janus-4o-7B

Unified multimodal model checkpoint for text-to-image and image-conditioned generation.

Model

Project repository

Code, examples, and release documentation for the ShareGPT-4o-Image project.

Repository