Built-in Browser

The zero-configuration headless browser that runs inside every agent container, with a full set of automation tools.

The built-in browser is a headless Chromium instance that runs inside the agent's container. It is the default browser host and requires no setup -- every agent can use it immediately.

How It Works

When an agent calls browser_open, Superagent launches a Chromium process inside the agent's container and connects to it via the Chrome DevTools Protocol (CDP). The browser runs headlessly (no visible window), but you can watch the agent's activity in real time through the browser panel in the Superagent UI.

The built-in browser:

  • Requires no configuration. It is available to every agent out of the box.
  • Runs in an isolated container. Each agent gets its own browser instance with its own profile directory.
  • Preserves cookies and sessions. The browser uses a persistent profile, so sites remember the agent's login state across sessions.
  • Supports multiple tabs. Agents can open, switch between, and close tabs. The maximum number of concurrent tabs is configurable in Settings (default: 10).

Browser Tools

Agents interact with the browser through a set of MCP tools exposed by the browser MCP server. These tools are available automatically whenever an agent has browser access enabled.

ToolDescription
browser_openOpen the browser and navigate to a URL. If a tab with the same URL already exists, switches to it instead of opening a duplicate.
browser_closeClose the browser and free all resources. Call this when browsing is complete.

Page Inspection

ToolDescription
browser_snapshotGet an accessibility tree snapshot of the current page. Returns interactive elements with refs (like @e1, @e2) that can be used with other tools. Supports interactive, compact, and json modes.
browser_screenshotTake a screenshot of the current viewport or the full scrollable page. Optionally annotate the screenshot with numbered labels on interactive elements that correspond to snapshot refs.
browser_get_stateGet the current URL, a screenshot, and an accessibility snapshot in a single call. Useful for quickly understanding what the browser is showing.

Interaction

ToolDescription
browser_clickClick an element by its ref (e.g., @e1). Refs come from browser_snapshot.
browser_fillClear an input field and type a new value into it, identified by ref.
browser_selectSelect an option from a <select> dropdown by ref and value.
browser_hoverHover over an element to trigger menus, tooltips, or hover states.
browser_pressPress a keyboard key such as Enter, Tab, Escape, or a key combo like Control+a.
browser_scrollScroll the page in a given direction (up, down, left, right) by an optional pixel amount.
browser_uploadUpload a local file to a <input type="file"> element using a CSS selector.
browser_waitWait for a CSS selector to appear on the page before continuing.

Advanced Operations

ToolDescription
browser_runRun any agent-browser CLI command for advanced operations not covered by the dedicated tools.

The browser_run tool is a catch-all that exposes the full agent-browser command set. Some of the commands available through it include:

  • Navigation -- back, forward, reload
  • Tab management -- tab, tab new, tab <n>, tab close
  • JavaScript execution -- eval <js> to run arbitrary JavaScript in the page context
  • Element queries -- get text/html/value/attr/title/url/count/box <ref>
  • State checks -- is visible/enabled/checked <ref>
  • Cookie and storage management -- cookies, cookies set/clear, storage local/session
  • Frame switching -- frame <selector>, frame main
  • Dialog handling -- dialog accept, dialog dismiss
  • Browser settings -- set viewport/device/geo/offline/headers/media
  • Network interception -- network route/unroute/requests
  • Drag and drop -- drag <srcRef> <tgtRef>
  • Double-click, focus, type -- dblclick, focus, type

Screenshots and Snapshots

Agents have two complementary ways to understand what is on the page:

Screenshots capture a visual image of the browser viewport (or the full scrollable page). They are returned as images that the model can see directly. Annotated screenshots overlay numbered labels on interactive elements, making it easy for the agent to visually identify what to click. Each label [N] corresponds to ref @eN from the accessibility snapshot.

Accessibility snapshots return a structured text representation of the page's interactive elements. Each element gets a ref like @e1 that the agent uses with browser_click, browser_fill, and other interaction tools. Snapshots are more compact than screenshots and work well for form-heavy pages where the agent needs to identify specific input fields.

In practice, agents typically use browser_snapshot for most interactions and fall back to browser_screenshot when they need to understand the visual layout or debug rendering issues.

JavaScript Execution

Agents can execute arbitrary JavaScript in the browser page context using browser_run with the eval command:

browser_run({ command: 'eval document.title' })
browser_run({ command: 'eval JSON.stringify(Array.from(document.querySelectorAll("table tr")).map(r => r.textContent))' })

This is useful for extracting data that is not easily accessible through the accessibility snapshot, or for triggering client-side behavior.

Human-in-the-Loop

When the agent encounters an obstacle that requires human interaction -- such as a login page, CAPTCHA, or 2FA prompt -- it calls the request_browser_input tool. This pauses the agent, shows you the browser preview with an "input needed" overlay, and waits until you complete the action and click Done. After you finish, the agent takes a fresh snapshot and continues from the new page state.

Common Use Cases

  • Web scraping -- navigate to a site, extract structured data from tables or lists, and compile it into a report.
  • Form automation -- fill out multi-step forms, upload documents, and submit applications.
  • Web app testing -- open a URL, interact with UI elements, take screenshots, and verify expected behavior.
  • Research -- search across multiple sites, read articles, and synthesize findings.
  • Account management -- check dashboards, update settings, and download reports from web-based tools (after you log in for the agent).

Limitations

  • The built-in browser runs inside the container and does not have access to your local filesystem, extensions, or saved passwords. If you need authenticated access, consider using Chrome Integration instead.
  • Some websites employ bot detection that may block headless browsers. For sites with aggressive anti-bot measures, consider Browserbase which offers stealth mode and residential proxies.
  • The browser cannot access localhost URLs from the host machine, since it runs in an isolated container.