ShareGPT-4o-Image and Janus-4o

GPT-4o-level image generation data and a unified multimodal image generation model.

Image generation alignment
ShareGPT-4o-Image Janus-4o Text-to-image Image editing Multimodal alignment
FreedomIntelligence open source impact

ShareGPT-4o-Image distills GPT-4o-style image generation interactions into an open dataset, and Janus-4o turns that data into a practical multimodal model for text-to-image and text-plus-image-to-image generation.

Research Storyline

Data
Open the interaction format

ShareGPT-4o-Image captures text-to-image and image-conditioned editing instructions so researchers can study generation behavior with open data.

Model
Train a unified multimodal generator

Janus-4o uses the data to support image understanding and generation in one compact model family.

Edit
Move beyond one-shot generation

Image-conditioned generation and editing matter because real users usually refine, transform, and localize existing visual content.

Reuse
Make visual alignment reproducible

The dataset, model, and paper give the community an open reference point for GPT-4o-like image generation interactions.

What The Project Contributes

Open generation data

The dataset includes text-to-image and image-conditioned generation examples, making GPT-4o-like generation behavior easier to study and reproduce.

Unified multimodal model

Janus-4o adapts a unified multimodal architecture so image understanding and image generation can live in one model family.

Practical image editing

The data and model support not only generation from text but also edits and transformations conditioned on existing images.

Display Figures

Paper Trail

Dataset
ShareGPT-4o-Image

Open image generation and editing instruction data for studying GPT-4o-style multimodal generation behavior.

Dataset
Model
Janus-4o-7B

A unified multimodal model checkpoint for text-to-image and image-conditioned generation.

Model
Context
LongLLaVA and multimodal infrastructure

Connects the generation project to the lab's larger multimodal program on visual context, data quality, and open model behavior.

Long-context multimodal AI

Why It Matters

  • Open image generation research needs high-quality instruction data, not just model weights.
  • Image editing is an important multimodal interaction pattern because users often refine existing visual content rather than generate from scratch.
  • The project gives the community a compact way to study GPT-4o-level image generation behavior in open models.

Resource Map

ShareGPT-4o-Image dataset

Open image generation and editing instruction data.

Dataset
Janus-4o-7B

Unified multimodal model checkpoint for text-to-image and image-conditioned generation.

Model
Project repository

Code, examples, and release documentation for the ShareGPT-4o-Image project.

Repository