Snapshot
How agents read a page as a structured snapshot, and how the @N refs work.
Snapshot is how an agent reads a web page. It turns the current page into compact structured text and assigns temporary numbers (@1, @2, etc.) to the actionable elements — buttons, inputs, links. The agent decides what to do next from that snapshot, instead of reading the raw HTML or guessing screen coordinates.
What it solves
To a person, a web page is a visual interface. To an agent, it's tens of thousands of tokens of HTML, scripts, styles, and live state. Feeding that straight into an LLM has two recurring costs:
- High token cost. A typical admin page can run to 30k tokens, re-read before every action.
- Drowning in noise. Most of a full DOM (styles, scripts, hidden nodes) has nothing to do with the decision and just gets in the way.
A snapshot is built from the browser's accessibility tree — the semantic view of a page the browser keeps for screen readers — and compresses the page into a few hundred tokens:
- Page title and current URL.
- Visible text and the main structure.
- Clickable, fillable, and selectable elements, with their roles and names.
- A temporary
@Nfor each interactive element.
That's enough for the agent to decide which button to click and which field to fill — without the raw HTML or any screen coordinates.
How @N refs work
Each interactive element in the snapshot gets a temporary number:
@1 [input] "Search"
@2 [button] "Submit"
@3 [link] "Next page"
The agent acts on those numbers, e.g. await click('@2').
@N is only valid for the current snapshot. After the page changes (navigation, refresh, dialog, form submit, tab switch, dynamic re-render), the old numbers may stop pointing anywhere. The dependable habit is to take a fresh snapshot after each change instead of holding onto a @N.
If you need a stable reference to an element across many steps, use the loc=... selector from the snapshot output as a stable selector, or write a CSS selector directly. See ego-browser.
When you'll notice the snapshot
In most cases you don't touch the snapshot at all — the agent reads it on its own. You may see it surface in two places:
- The agent says "re-snapshot" or "page snapshot," meaning it detected a change and is re-reading the current state.
- The task result includes a snapshot excerpt as evidence.
Writing a task for the agent
A snapshot helps the agent see the page clearly, but the boundaries of the task are yours to set. A good task description tells the agent:
- The target page or site.
- What to read, fill, click, or download.
- What it must not do (delete, publish, pay, send mail, etc.).
- Whether to pause on captcha, payment, or authorization screens.
- The shape of the result you want (table, summary, screenshot, local file path).
Examples:
Open my GitHub Notifications and list the PRs that need my review — repo name, title, URL.
Don't archive, don't mark anything as read.
Open the orders dashboard, filter yesterday's orders, and download the CSV.
Pause if you hit a login challenge or an export confirmation, and tell me where the file landed.
Reviewing what the agent did
When the task is finished, these are the signals to check:
- Did the agent state which pages it visited and the key actions it took?
- Does the returned result include things you can verify (titles, IDs, URLs, amounts, timestamps)?
- For a download, did it give you the local file path?
- For anything that modifies or submits, did it pause for confirmation before the final step?
- Are the relevant pages still open in the Space so you can look?
If something seems off, ask the agent to re-snapshot the current page rather than to keep extrapolating from its earlier answer.
Common questions
Why does the agent say a ref is invalid?
The page changed after the previous snapshot. Ask it to take a new one.
Can a snapshot read everything?
Not always. Off-screen content, text inside images, complex canvases, and sandboxed cross-origin iframes may not show up fully. In those cases the agent might combine snapshots with screenshots, text extraction, or manual confirmation.
Does taking a snapshot change anything on the page?
No. A snapshot only reads page structure. The clicks, fills, submits, uploads, and deletes that happen after it are what actually modify state.
Do I need to remember @1 or @2?
No. Treat them as temporary numbers for the current page. The agent uses them; you set the goal and the boundaries.