Multilingual and Localized LLMs

Open multilingual, Arabic, and medical language models for localized AI access.

Localized language models
Apollo AceGPT Phoenix Arabic LLMs Multilingual medicine
Apollo multilingual medical map

The multilingual model line makes open LLMs useful outside the English-centric default. Phoenix democratized chat models across languages, AceGPT localized LLMs for Arabic, and Apollo extends medical LLMs across many languages with models, corpora, and benchmarks.

Research Storyline

General
Phoenix starts from multilingual chat access

The early model line asks how open chat models can serve users across languages rather than only English-heavy settings.

Localize
AceGPT focuses on Arabic alignment

Arabic LLM localization shows that alignment data, vocabulary, culture, and evaluation must be designed for the target language.

Medicine
Apollo moves localization into healthcare

Apollo combines multilingual medical corpora, models, and benchmarks so medical AI can cover many more language communities.

Scale
ApolloMoE makes language-family specialization efficient

Mixture-of-experts style multilingual medical modeling keeps specialization while reducing the cost of serving many languages.

Project Families

Phoenix and LLMZoo

An early open multilingual chat direction that combined multilingual instruction data, open checkpoints, and evaluation resources.

AceGPT and Arabic alignment

A localized Arabic LLM family, followed by work on progressive Arabic vocabulary expansion and native Arabic alignment.

Apollo medical LLMs

A multilingual medical model, dataset, and benchmark stack that pushes medical AI toward broader global access.

ApolloMoE

A mixture-of-language-family-experts direction for efficient multilingual medical modeling across language families.

Display Figures

Paper Trail

Phoenix
Phoenix: Democratizing ChatGPT across Languages

Introduces the lab's early multilingual chat direction and open multilingual model resources.

Model
AceGPT
AceGPT, Localizing Large Language Models in Arabic

Shows how localized data and alignment produce stronger Arabic LLM behavior than generic multilingual tuning.

Repository
Apollo
Apollo and ApolloMoE

Extends localization to multilingual medical AI with corpora, benchmarks, models, and language-family expert routing.

Apollo

Why It Matters

  • Medical and public-interest AI cannot be accessible if it works only in a small set of high-resource languages.
  • Localization is more than translation: model vocabulary, alignment data, evaluation, and cultural/domain assumptions all matter.
  • The project connects general multilingual chat, Arabic-specific alignment, and multilingual medical AI into a single access agenda.

Resource Map

Apollo

Multilingual medical models, datasets, benchmark code, and public checkpoints.

Repository
ApolloCorpus

Large multilingual medical corpus released on Hugging Face.

Dataset
AceGPT

Arabic LLM resources and checkpoints for localized open language modeling.

Model