Skip to content

Agents can drive a real Chromium browser to navigate pages, click buttons, fill forms, extract content as Markdown or HTML, take screenshots, and run JavaScript. Just like you would in a regular browser, but scripted by the agent.

What you can ask agents to do

  • "Go to example.com and find their pricing page."
  • "Fill out the contact form on this website with my details."
  • "Take a screenshot of this dashboard."
  • "Log into my account on this site and check my order status."
  • "Extract all product names and prices from this page as a table."
  • "Open the docs and the changelog in two tabs and compare them."

How it works

When an agent browses a website, it:

  1. Opens the page in a headless Chromium instance (Browserless under the hood).
  2. Waits for the page to load, including any JavaScript.
  3. Acts on the page through tools: navigate, click, hover, fill, scroll, press-key, evaluate, extract.
  4. Returns what it found to the conversation. Typically Markdown for content extraction, JSON for structured queries, or a screenshot.

You see each step in real time in the chat: which URL the agent visited, what it clicked, and what it pulled back.

Tools the agent has

The browser tool surface includes:

  • Navigation: navigate to URL, go back/forward, reload, close, new/list/switch/close tab.
  • Content: get the page as Markdown, get all links, get a DOM/ARIA snapshot, take a screenshot.
  • Interaction: click, hover, select, fill, press a key, scroll, wait for an element.
  • Scripting: evaluate arbitrary JavaScript in the page context.

Markdown extraction is the default for "read this page". It strips chrome and ads and gives the agent a clean text view. For form interaction or anything that needs precise targeting, the agent uses the DOM/ARIA snapshot to find selectors.

Multiple tabs

Agents can hold several tabs open at once and switch between them. Useful for "compare these two pages" or "while I read this article, monitor the other tab for updates" workflows.

Browser profiles and persistence

Frona uses browser profiles to persist cookies, local storage, and session data across conversations. If an agent logs into a website today, it can still be logged in next week.

There is a default profile the browser will use if you don't specify a profile for the agent. You can also create multiple named profiles to keep different browsing contexts separate. For example, a "Google" profile for your Google account and a "LinkedIn" profile for LinkedIn. Each profile has its own cookies and session state, stored in its own directory on disk.

Profiles are managed as a credential type in the vault system. Create them from the vault management interface.

When the agent needs your help

Some situations require a human touch:

  • CAPTCHAs. The agent can't solve them automatically.
  • Two-factor authentication. The agent can't enter your 2FA code.
  • Complex login flows. OAuth redirects, SSO screens, or unusual form layouts.

When this happens, the agent pauses and gives you a link to a browser debugger session. Click the link, handle the interaction in your browser tab, and the agent continues from where you left off.

Browsing vs. fetching

ScenarioBest tool
Read an article or documentation pageWeb fetch (faster, simpler)
Fill out a form or click buttonsBrowser automation
Navigate a JavaScript-heavy appBrowser automation
Extract structured data from a pageBrowser automation
Quick content retrievalWeb fetch
Log into a websiteBrowser automation

For simple "read this page" requests, agents typically use web fetch instead. It's faster and doesn't need a full browser. Browser automation is for when interaction is needed.

Tips

  • Be specific about what you want. "Go to example.com/pricing and extract the plan names, prices, and features into a table" gets better results than "check out their pricing."
  • Mention login steps if needed. If the agent needs to log in first, say so: "Log into my account at example.com and then check order #12345."
  • Use screenshots for verification. Ask the agent to take a screenshot if you want to see what it sees.
  • Use named profiles for distinct identities. A work Google profile and a personal Google profile should be separate so the agent doesn't get confused about which account is active.

Requirements

Browser automation requires a running Browserless instance. See Deployment for setup.

Next steps