MAI-UI: Real-World Centric Foundation GUI Agents
MAI-UI is a family of foundational GUI agent models from Tongyi-MAI Lab, ranging from 2B to 235B.
Overall Performance.
Technical Highlights
For the first time, MAI-UI natively integrates three core capabilities—user interaction, MCP tool calling, and device-cloud collaboration—into a unified architecture through autonomous evolution data pipelines and large-scale online reinforcement learning technology. (Currently, 2B and 8B models are open-sourced.)
MCP Tool Usage
Model Context Protocol tools for enhanced functionality.
User Interaction
Advanced user interaction capabilities in real-world scenarios.
Online Reinforcement Learning
Large-scale online RL for continuous model improvement and adaptation.
Device-Cloud Collaboration
Efficient collaboration between device and cloud for balanced performance.
Real-World Demos
Watch MAI-UI in action across different real-world scenarios.
Demo 1: Office Scenario
Demo 2: Daily Life Scenario
Demo 3: Shopping Scenario
Demo 4: Travel Scenario
Device-Cloud Collaboration
Device-Cloud Collaboration: Simple Tasks
Device-Cloud Collaboration: Complex Tasks
Evaluating in Real-World MobileWorld
We also introduce MobileWorld benchmark: While maintaining the same level of rigorous, reproducible evaluation as AndroidWorld, MobileWorld offers a more challenging online mobile-use benchmark by introducing four additional features that better capture real-world agent behavior.
GUI Grounding Performance
| Model | Avg |
|---|---|
| Gemini-3-Pro | 72.7 |
| Seed1.8 | 73.1 |
| GTA1-7B | 50.1 |
| UI-Venus-7B | 50.8 |
| GUI-Owl-7B | 54.9 |
| GUI-Owl-32B | 58.0 |
| GTA1-32B | 63.6 |
| UI-Venus-72B | 61.9 |
| UI-MAI-2B | 57.4 |
| + Zoom-In | 62.8 |
| UI-MAI-8B | 65.8 |
| + Zoom-In | 70.9 |
| UI-MAI-32B | 67.9 |
| + Zoom-In | 73.5 |
| Model | Avg |
|---|---|
| InfiGUI-G1-3B | 22.0 |
| OS-Altas-7B | 9.0 |
| UI-Tars-1.5-7B | 22.3 |
| UI-Venus-7B | 26.5 |
| InfiGUI-G1-7B | 26.1 |
| Phi-Ground | 27.2 |
| UI-TARS-72B | 25.5 |
| UI-Venus-72B | 36.8 |
| UI-MAI-2B | 30.3 |
| + Zoom-In | 31.9 |
| UI-MAI-8B | 40.7 |
| + Zoom-In | 42.4 |
| UI-MAI-32B | 47.1 |
| + Zoom-In | 49.2 |
| Model | Avg |
|---|---|
| InfiGUI-G1-3B | 73.4 |
| OS-Atlas-7B | 41.4 |
| UI-TARS-1.5-7B | 64.3 |
| UGround-V1-7B | 65.7 |
| GTA1-7B | 78.5 |
| GUI-Owl-7B | 80.5 |
| InfiGUI-G1-7B | 80.8 |
| GUI-Owl-32B | 83.0 |
| GTA1-32B | 83.4 |
| UI-TARS-DPO-72B | 74.3 |
| InternVL3-78B | 72.2 |
| UI-MAI-2B | 82.6 |
| UI-MAI-8B | 88.8 |
| UI-MAI-32B | 91.3 |
| Agent Model | Avg |
|---|---|
| UI-TARS-1.5-7B | 52.8 |
| GTA1-7B | 55.1 |
| GUI-Owl-7B | 55.9 |
| UI-Venus-7B | 58.8 |
| OpenCUA-32B | 59.6 |
| GUI-Owl-32B | 58.0 |
| GTA1-32B | 65.2 |
| UI-Venus-72B | 70.4 |
| UI-MAI-2B | 52.0 |
| + Zoom-In | 55.9 |
| UI-MAI-8B | 60.1 |
| + Zoom-In | 64.2 |
| UI-MAI-32B | 67.6 |
| + Zoom-In | 70.9 |
| Agent Model | Avg |
|---|---|
| Operator | 57.8 |
| Jedi-3B | 61.0 |
| Jedi-7B | 63.8 |
| UI-TARS-1.5-7B | 64.2 |
| GTA1-7B | 67.7 |
| Qwen2.5-VL-32B | 59.6 |
| OpenCUA-32B | 70.2 |
| GTA1-32B | 72.2 |
| UI-MAI-2B | 63.5 |
| + Zoom-In | 66.3 |
| UI-MAI-8B | 68.6 |
| + Zoom-In | 72.9 |
| UI-MAI-32B | 73.9 |
| + Zoom-In | 75.0 |
| Model | Avg |
|---|---|
| Phi-ground | 83.8 |
| OS-Atlas-7B | 85.1 |
| UI-Tars-1.5-7B | 89.0 |
| OpenCUA-7B | 92.3 |
| GTA1-7B | 92.4 |
| GUI-Owl-7B | 92.8 |
| UI-Venus-7B | 94.1 |
| GUI-Owl-32B | 93.2 |
| OpenCUA-32B | 93.4 |
| GTA1-32B | 95.2 |
| UI-Venus-72B | 95.3 |
| UI-MAI-2B | 92.5 |
| UI-MAI-8B | 95.2 |
| UI-MAI-32B | 96.5 |
Citation
If you find MAI-UI useful in your research, please cite our papers:
@misc{zhou2025maiuitechnicalreportrealworld,
title={MAI-UI Technical Report: Real-World Centric Foundation GUI Agents},
author={Hanzhang Zhou and Xu Zhang and Panrong Tong and Jianan Zhang and Liangyu Chen and Quyu Kong and Chenglin Cai and Chen Liu and Yue Wang and Jingren Zhou and Steven Hoi},
year={2025},
eprint={2512.22047},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.22047},
}
@misc{kong2025mobileworldbenchmarkingautonomousmobile,
title={MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments},
author={Quyu Kong and Xu Zhang and Zhenyu Yang and Nolan Gao and Chen Liu and Panrong Tong and Chenglin Cai and Hanzhang Zhou and Jianan Zhang and Liangyu Chen and Zhidan Liu and Steven Hoi and Yue Wang},
year={2025},
eprint={2512.19432},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2512.19432},
}
@misc{chen2025uiinsenhancingguigrounding,
title={UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning},
author={Liangyu Chen and Hanzhang Zhou and Chenglin Cai and Jianan Zhang and Panrong Tong and Quyu Kong and Xu Zhang and Chen Liu and Yuqi Liu and Wenxuan Wang and Yue Wang and Qin Jin and Steven Hoi},
year={2025},
eprint={2510.20286},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.20286},
}