Lightweight browser automation via Puppeteer, with optional vision mode.
Visit ProjectBrowser MCP (UI-TARS) Desktop is a lightweight browser automation tool powered by Puppeteer, with optional vision mode. It is part of UI-TARS Desktop, a GUI agent application that enables users to control their computers using natural language. The project leverages Vision-Language Models for interaction, automating tasks visually and integrating with file systems and command lines.
Yes, both the local and remote operators are free to use.
UI-TARS Desktop supports Windows and MacOS, with browser compatibility for web-based use.
The application processes data locally, ensuring privacy and security.
[!IMPORTANT] >
>
[2025-03-18] We released a technical preview version of a new desktop app - Agent TARS, a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.
UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language.
π Paper | π€ Hugging Face Models | 𫨠Discord | π€ ModelScope
π₯οΈ Desktop Application | π Midscene (use in browser) |
Instruction | Local Operator | Remote Operator |
---|---|---|
Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting. |