MAI-UI: Real-World Centric Foundation GUI Agents
MAI-UI is a family of foundational GUI agent models from Tongyi-MAI Lab, ranging from 2B to 235B.
Overall Performance.
Technical Highlights
For the first time, MAI-UI natively integrates three core capabilities—user interaction, MCP tool calling, and device-cloud collaboration—into a unified architecture through autonomous evolution data pipelines and large-scale online reinforcement learning technology. (Currently, 2B and 8B models are open-sourced.)
MCP Tool Usage
Model Context Protocol tools for enhanced functionality.
User Interaction
Advanced user interaction capabilities in real-world scenarios.
Online Reinforcement Learning
Large-scale online RL for continuous model improvement and adaptation.
Device-Cloud Collaboration
Efficient collaboration between device and cloud for balanced performance.
Real-World Demos
Watch MAI-UI in action across different real-world scenarios.
Demo 1: Office Scenario
Demo 2: Daily Life Scenario
Demo 3: Shopping Scenario
Demo 4: Travel Scenario
Device-Cloud Collaboration
Device-Cloud Collaboration: Simple Tasks
Device-Cloud Collaboration: Complex Tasks
Evaluating in Real-World MobileWorld
We also introduce MobileWorld benchmark: While maintaining the same level of rigorous, reproducible evaluation as AndroidWorld, MobileWorld offers a more challenging online mobile-use benchmark by introducing four additional features that better capture real-world agent behavior.
GUI Grounding Performance
| Model | Avg |
|---|---|
| Gemini-3-Pro | 72.7 |
| Seed1.8 | 73.1 |
| GTA1-7B | 50.1 |
| UI-Venus-7B | 50.8 |
| GUI-Owl-7B | 54.9 |
| GUI-Owl-32B | 58.0 |
| GTA1-32B | 63.6 |
| UI-Venus-72B | 61.9 |
| MAI-UI-2B | 57.4 |
| + Zoom-In | 62.8 |
| MAI-UI-8B | 65.8 |
| + Zoom-In | 70.9 |
| MAI-UI-32B | 67.9 |
| + Zoom-In | 73.5 |
| Model | Avg |
|---|---|
| InfiGUI-G1-3B | 22.0 |
| OS-Altas-7B | 9.0 |
| UI-Tars-1.5-7B | 22.3 |
| UI-Venus-7B | 26.5 |
| InfiGUI-G1-7B | 26.1 |
| Phi-Ground | 27.2 |
| UI-TARS-72B | 25.5 |
| UI-Venus-72B | 36.8 |
| MAI-UI-2B | 30.3 |
| + Zoom-In | 31.9 |
| MAI-UI-8B | 40.7 |
| + Zoom-In | 42.4 |
| MAI-UI-32B | 47.1 |
| + Zoom-In | 49.2 |
| Model | Avg |
|---|---|
| InfiGUI-G1-3B | 73.4 |
| OS-Atlas-7B | 41.4 |
| UI-TARS-1.5-7B | 64.3 |
| UGround-V1-7B | 65.7 |
| GTA1-7B | 78.5 |
| GUI-Owl-7B | 80.5 |
| InfiGUI-G1-7B | 80.8 |
| GUI-Owl-32B | 83.0 |
| GTA1-32B | 83.4 |
| UI-TARS-DPO-72B | 74.3 |
| InternVL3-78B | 72.2 |
| MAI-UI-2B | 82.6 |
| MAI-UI-8B | 88.8 |
| MAI-UI-32B | 91.3 |
| Agent Model | Avg |
|---|---|
| UI-TARS-1.5-7B | 52.8 |
| GTA1-7B | 55.1 |
| GUI-Owl-7B | 55.9 |
| UI-Venus-7B | 58.8 |
| OpenCUA-32B | 59.6 |
| GUI-Owl-32B | 58.0 |
| GTA1-32B | 65.2 |
| UI-Venus-72B | 70.4 |
| MAI-UI-2B | 52.0 |
| + Zoom-In | 55.9 |
| MAI-UI-8B | 60.1 |
| + Zoom-In | 64.2 |
| MAI-UI-32B | 67.6 |
| + Zoom-In | 70.9 |
| Agent Model | Avg |
|---|---|
| Operator | 57.8 |
| Jedi-3B | 61.0 |
| Jedi-7B | 63.8 |
| UI-TARS-1.5-7B | 64.2 |
| GTA1-7B | 67.7 |
| Qwen2.5-VL-32B | 59.6 |
| OpenCUA-32B | 70.2 |
| GTA1-32B | 72.2 |
| MAI-UI-2B | 63.5 |
| + Zoom-In | 66.3 |
| MAI-UI-8B | 68.6 |
| + Zoom-In | 72.9 |
| MAI-UI-32B | 73.9 |
| + Zoom-In | 75.0 |
| Model | Avg |
|---|---|
| Phi-ground | 83.8 |
| OS-Atlas-7B | 85.1 |
| UI-Tars-1.5-7B | 89.0 |
| OpenCUA-7B | 92.3 |
| GTA1-7B | 92.4 |
| GUI-Owl-7B | 92.8 |
| UI-Venus-7B | 94.1 |
| GUI-Owl-32B | 93.2 |
| OpenCUA-32B | 93.4 |
| GTA1-32B | 95.2 |
| UI-Venus-72B | 95.3 |
| MAI-UI-2B | 92.5 |
| MAI-UI-8B | 95.2 |
| MAI-UI-32B | 96.5 |
Citation
If you find MAI-UI useful in your research, please cite our papers:
@article{zhou2025mai,
title={MAI-UI Technical Report: Real-World Centric Foundation GUI Agents},
author={Zhou, Hanzhang and Zhang, Xu and Tong, Panrong and Zhang, Jianan and Chen, Liangyu and Kong, Quyu and Cai, Chenglin and Liu, Chen and Wang, Yue and Zhou, Jingren and others},
journal={arXiv preprint arXiv:2512.22047},
year={2025}
}
@article{kong2025mobileworld,
title={MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments},
author={Kong, Quyu and Zhang, Xu and Yang, Zhenyu and Gao, Nolan and Liu, Chen and Tong, Panrong and Cai, Chenglin and Zhou, Hanzhang and Zhang, Jianan and Chen, Liangyu and others},
journal={arXiv preprint arXiv:2512.19432},
year={2025}
}
@article{chen2025ui,
title={UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning},
author={Chen, Liangyu and Zhou, Hanzhang and Cai, Chenglin and Zhang, Jianan and Tong, Panrong and Kong, Quyu and Zhang, Xu and Liu, Chen and Liu, Yuqi and Wang, Wenxuan and others},
journal={arXiv preprint arXiv:2510.20286},
year={2025}
}