AI Observatory / Model Radar Xiaomi / xiaomi/mimo-v2-omni

Xiaomi: MiMo-V2-Omni

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

262,144 Context
65,536 Max output
$0.40 / 1M Prompt price
2026-03-18 Created
01 / Snapshot

Pricing, context, modalities, and parameters.

Model Radar detail pages stay neutral and operator-readable: core metadata first, then workflow fit.

Provider Xiaomi Input modalities text, audio, image, video
Output modalities text Prompt price $0.40 / 1M
Completion price $2.00 / 1M Request price N/A
Context length 262,144 Max completion tokens 65,536
Supported parameters frequency_penalty, include_reasoning, max_tokens, presence_penalty, reasoning, response_format, stop, temperature, tool_choice, tools, top_p
Best for xiaomi/mimo-v2-omni

Xiaomi: MiMo-V2-Omni

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

Cross-modal work Voice and audio Long context High-volume usage new vision tool-capable long-context low-cost
03 / Colophon

Routes and exits.

Each model page stays simple: overview, compare, related models, then back to the public hub.