ego (lite) is just a browser, ego is your personal agent across devices.
Join waitlist
English

Snapshot

How agents read a page as a structured snapshot, and how the @N refs work.

llms.txt

Snapshot is how an agent reads a web page. It turns the current page into compact structured text and assigns temporary numbers (@1, @2, etc.) to the actionable elements — buttons, inputs, links. The agent decides what to do next from that snapshot, instead of reading the raw HTML or guessing screen coordinates.

What it solves

To a person, a web page is a visual interface. To an agent, it's tens of thousands of tokens of HTML, scripts, styles, and live state. Feeding that straight into an LLM has two recurring costs:

  • High token cost. A typical admin page can run to 30k tokens, re-read before every action.
  • Drowning in noise. Most of a full DOM (styles, scripts, hidden nodes) has nothing to do with the decision and just gets in the way.

A snapshot is built from the browser's accessibility tree — the semantic view of a page the browser keeps for screen readers — and compresses the page into a few hundred tokens:

  • Page title and current URL.
  • Visible text and the main structure.
  • Clickable, fillable, and selectable elements, with their roles and names.
  • A temporary @N for each interactive element.

That's enough for the agent to decide which button to click and which field to fill — without the raw HTML or any screen coordinates.

How @N refs work

Each interactive element in the snapshot gets a temporary number:

@1 [input]  "Search"
@2 [button] "Submit"
@3 [link]   "Next page"

The agent acts on those numbers, e.g. await click('@2').

@N is only valid for the current snapshot. After the page changes (navigation, refresh, dialog, form submit, tab switch, dynamic re-render), the old numbers may stop pointing anywhere. The dependable habit is to take a fresh snapshot after each change instead of holding onto a @N.

If you need a stable reference to an element across many steps, use the loc=... selector from the snapshot output as a stable selector, or write a CSS selector directly. See ego-browser.

When you'll notice the snapshot

In most cases you don't touch the snapshot at all — the agent reads it on its own. You may see it surface in two places:

  1. The agent says "re-snapshot" or "page snapshot," meaning it detected a change and is re-reading the current state.
  2. The task result includes a snapshot excerpt as evidence.

Writing a task for the agent

A snapshot helps the agent see the page clearly, but the boundaries of the task are yours to set. A good task description tells the agent:

  • The target page or site.
  • What to read, fill, click, or download.
  • What it must not do (delete, publish, pay, send mail, etc.).
  • Whether to pause on captcha, payment, or authorization screens.
  • The shape of the result you want (table, summary, screenshot, local file path).

Examples:

Open my GitHub Notifications and list the PRs that need my review — repo name, title, URL.
Don't archive, don't mark anything as read.
Open the orders dashboard, filter yesterday's orders, and download the CSV.
Pause if you hit a login challenge or an export confirmation, and tell me where the file landed.

Reviewing what the agent did

When the task is finished, these are the signals to check:

  • Did the agent state which pages it visited and the key actions it took?
  • Does the returned result include things you can verify (titles, IDs, URLs, amounts, timestamps)?
  • For a download, did it give you the local file path?
  • For anything that modifies or submits, did it pause for confirmation before the final step?
  • Are the relevant pages still open in the Space so you can look?

If something seems off, ask the agent to re-snapshot the current page rather than to keep extrapolating from its earlier answer.

Common questions

Why does the agent say a ref is invalid?

The page changed after the previous snapshot. Ask it to take a new one.

Can a snapshot read everything?

Not always. Off-screen content, text inside images, complex canvases, and sandboxed cross-origin iframes may not show up fully. In those cases the agent might combine snapshots with screenshots, text extraction, or manual confirmation.

Does taking a snapshot change anything on the page?

No. A snapshot only reads page structure. The clicks, fills, submits, uploads, and deletes that happen after it are what actually modify state.

Do I need to remember @1 or @2?

No. Treat them as temporary numbers for the current page. The agent uses them; you set the goal and the boundaries.