Browser Use Overview

Give your agents the ability to browse the web, fill forms, take screenshots, and interact with web applications.

Superagent agents can control a full web browser to perform tasks that require interacting with websites. An agent can navigate to pages, click buttons, fill out forms, extract data, take screenshots, and run JavaScript -- all while you watch in real time through the browser panel in the Superagent UI.

What Browser Use Enables

With browser access, your agents can:

Navigate the web -- open URLs, follow links, search engines, and browse multi-page workflows.
Interact with web apps -- click buttons, fill forms, select dropdowns, upload files, and submit data.
Extract information -- read page content through accessibility snapshots, take screenshots, and run JavaScript to pull structured data.
Handle multi-step flows -- complete checkout processes, fill multi-page forms, and navigate authenticated dashboards.
Request your help when needed -- pause and ask you to log in, solve a CAPTCHA, or complete a 2FA challenge, then resume automatically.

The Three Browser Options

Superagent offers three browser hosts. You choose which one to use in Settings > Browser.

Browser Host	Description	Best For
Built-in Browser	A headless Chromium browser that runs inside the agent's container. Works out of the box with zero configuration.	Quick tasks, web scraping, form automation, testing.
Google Chrome	Connects to your local Chrome installation and can use your existing profiles, cookies, and logged-in sessions.	Tasks that need access to your authenticated accounts without re-logging in.
Browserbase	A cloud browser service that runs sessions on remote infrastructure with anti-detection and proxy support.	Scalable automation, avoiding IP blocks, stealth browsing.

The default is the built-in browser. You can change the browser host at any time, and the change applies to all new browser sessions.

When to Use Browser Automation

Browser automation is useful when your agent needs to interact with a website that does not offer an API, or when the task is inherently visual. Common scenarios include:

Researching information across multiple websites.
Filling out web forms on behalf of a user.
Monitoring a web page for changes.
Extracting data from sites that only render content in a browser.
Testing a web application's user interface.
Navigating internal tools that require authentication.

If a service provides a dedicated API or MCP integration, prefer that over browser automation -- APIs are faster, more reliable, and less fragile than UI-based interaction.

The Browser Panel

When an agent opens a browser, a panel slides open on the right side of the chat interface. This panel provides a live view of what the agent sees and lets you interact directly.

Live Preview

The browser panel renders a real-time screencast of the browser viewport. Frames are streamed over a WebSocket connection and drawn to a canvas element, so you see exactly what the agent sees with minimal delay. The preview automatically scales to fit the panel width while preserving the browser's aspect ratio.

Tab Bar

When the agent has multiple tabs open, a tab bar appears at the top of the browser panel. Each tab shows its title, and the agent's currently active tab is marked with a blue indicator dot. You can:

Click a tab to switch the preview to that tab.
Right-click a tab to close it (except the agent's active tab).
Toggle auto-follow using the eye icon -- when enabled, the preview automatically switches to whichever tab the agent is working in.

Activity Log

Below the browser preview is an activity log that lists every browser tool call the agent has made in the current session. Each entry shows the tool name (such as "Click", "Fill Input", or "Screenshot") along with a brief summary of its parameters. You can expand any entry to see the full result text. This log is useful for understanding what the agent did and debugging any issues.

Controls

A floating control pill at the bottom of the browser panel provides:

Pause / Resume -- temporarily pause the agent's execution, interact with the browser yourself, then resume.
Stop -- close the browser entirely. If the agent is actively running, you will see a confirmation dialog.
Expand / Collapse -- widen the browser panel for a larger preview, or shrink it back to the default width.

Human-in-the-Loop Input

Some actions require your direct involvement -- logging into a site, solving a CAPTCHA, or completing two-factor authentication. When the agent encounters one of these, it calls the request_browser_input tool, which:

Shows an overlay on the browser preview with a pulsing "Your input needed" indicator.
Displays a message card in the chat explaining what the agent needs you to do.
Pauses the agent until you click Done (after completing the task in the browser) or Dismiss (to skip and continue the conversation).

This workflow means the agent can handle most of a browsing task autonomously and only involve you for the steps that truly require a human.

Resizable Panel

You can drag the left edge of the browser panel to resize it. The width is remembered across sessions. The minimum width is 320px and the maximum is 800px.