Using Tools

This guide explains how tools work from a user’s perspective. What you see, what you can do, and how to make the most of agent capabilities.

What tools look like in chat

When an agent decides to use a tool, you see it in the conversation:

The agent announces what tool it’s using and why
The tool runs (you might see a loading indicator)
The result appears in the chat
The agent uses the result to continue its response

Some tools are fast (web search), while others take longer (browsing a complex website).

Browser tools in action

When an agent browses a website, you can follow along:

You see which URLs it visits
Screenshots show you what the page looks like
If the agent gets stuck (CAPTCHA, login page), it can give you control

This is useful for tasks like:

Researching a topic across multiple websites
Filling out web forms
Extracting data from websites
Monitoring web pages

Asking agents to use specific tools

You don’t need to name tools directly. Just describe what you want, and the agent picks the right tool:

“Search the web for…” triggers web_search
“Go to this website and…” triggers browser tools
“Remember that I…” triggers remember
“Run this command…” triggers cli
“Create a file with…” triggers produce_file

When tools need your help

Two situations where you’ll be asked to intervene:

Taking over the browser

If a website requires something the agent can’t handle (CAPTCHA, complex 2FA, unusual interactions), the agent uses request_user_takeover. You get a link to a live browser session where you can interact with the page directly. When you’re done, let the agent know and it continues.

Answering questions

The agent might need your input to proceed. It presents options or asks a yes/no question. Pick your answer and the conversation continues.

Tool limitations

Browser sessions are limited by the number of concurrent slots (default: 10)
CLI commands run in a sandbox with filesystem and network restrictions
Web search results depend on the configured search provider
Voice calls require Twilio configuration
Tool turns are capped per response (default: 200) to prevent runaway loops