AI Observatory / Model Radar
Xiaomi / xiaomi/mimo-v2-omni
Xiaomi: MiMo-V2-Omni
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...
262,144
Context
65,536
Max output
$0.40 / 1M
Prompt price
2026-03-18
Created
01 / Snapshot
Pricing, context, modalities, and parameters.
Model Radar detail pages stay neutral and operator-readable: core metadata first, then workflow fit.
| Provider | Xiaomi | Input modalities | text, audio, image, video |
|---|---|---|---|
| Output modalities | text | Prompt price | $0.40 / 1M |
| Completion price | $2.00 / 1M | Request price | N/A |
| Context length | 262,144 | Max completion tokens | 65,536 |
| Supported parameters | frequency_penalty, include_reasoning, max_tokens, presence_penalty, reasoning, response_format, stop, temperature, tool_choice, tools, top_p | ||
Best for
xiaomi/mimo-v2-omni
Xiaomi: MiMo-V2-Omni
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...
Cross-modal work
Voice and audio
Long context
High-volume usage
new
vision
tool-capable
long-context
low-cost
03 / Colophon
Routes and exits.
Each model page stays simple: overview, compare, related models, then back to the public hub.