Skip to main content

Browser Viewer

The Browser Viewer is a browser-in-browser pane embedded in the Chat page’s session panel. It renders live screenshots from a Playwright Chromium instance running inside the agent container, giving real-time visibility into what an agent sees and does on the web.

Interaction Modes

Take Control

Direct interaction with the remote browser. All input is forwarded to the Playwright instance:
  • Click — click through to elements on the page.
  • Scroll — mouse wheel events forwarded as-is.
  • Keyboard — keystrokes sent directly to Playwright.

Describe

Guided interaction for instructing the agent about a specific element:
  1. Click an element to select it.
  2. A floating popover appears showing element details (tag, id, classes, CSS selector, raw HTML).
  3. Type a natural-language description of what the agent should do with that element.
  4. Send the description to the chat thread for the agent to act on.
ControlBehavior
Address barDisplays current URL. Type a new URL and press Enter to navigate.
Back buttonNavigate to the previous page in session history.
Forward buttonNavigate to the next page in session history.
Reload buttonReload the current page.

Screenshot Refresh

Screenshots refresh automatically after every action (click, navigate, scroll, keypress, type). No manual refresh is required.

Proxy Chain

All interactions are proxied through the full service stack:
UI (browser) -> Next.js API route -> Express API -> Agentbox (Starlette)
The UI never communicates directly with the agent container.

Agentbox Endpoints

EndpointDescription
/browser/clickClick at coordinates on the page
/browser/elementGet element details at coordinates (Describe mode)
/browser/navigateNavigate to a URL
/browser/historyGo back or forward in session history
/browser/scrollScroll the page by a delta
/browser/typeType text into the focused element
/browser/keypressSend a single keystroke to the page