ego (lite) is just a browser, ego is your personal agent across devices.
Join waitlist
English

ego-browser

The browser automation runtime AI agents use to drive ego lite's real Chromium session.

llms.txt

ego-browser is the browser automation runtime ego lite ships for AI agents. It speaks the Chrome DevTools Protocol to the real Chromium session inside ego lite and takes a Node.js heredoc script as its entry point: the agent writes the whole JS flow in a single stdin delivery, all helpers are pre-injected into the script's scope, and the browser state lives on inside a Space.

ego-browser is not meant for humans to drive a browser by hand, and it isn't a replacement for Playwright or Puppeteer. The intended reader is an LLM agent.

Who it's for

  • AI coding agents that need to drive a browser: Claude Code, Codex, Cursor, custom SDK agents.
  • Teams building vertical agents — automating Lark, Google Docs, Salesforce, and similar back offices.
  • Repeating fixed web flows: login, fill, export, search, read tables.
  • Anyone who has tried to stuff a full DOM or page of HTML into an LLM and hit the token wall.

Install

It comes with ego lite — see Quick start. After install, run ego-browser from any directory.

You can also install the skill standalone:

npx skills add github:CitroLabs/ego-lite/skills/ego-browser

Core loop

The typical rhythm for agents driving a page — everything in a single heredoc:

ego-browser nodejs <<'EOF'
const task = await useOrCreateTaskSpace('search github issues')

await openOrReuseTab('https://github.com/issues', { wait: true, timeout: 20 })

cliLog(await snapshotText())

EOF
  1. Reuse or create a Task Space (declare it in every heredoc — see Space).
  2. Open the target page.
  3. Read the snapshot (snapshotText()) to get a semantic tree with [ref=N, loc=..., url=...].
  4. Act on the page by @N ref or CSS selector.
  5. Print the final result with cliLog(...).

Inside the heredoc you're in a Node.js process; inside js(...) you're in the page context. Don't mix them.

Helper reference

Every helper is available in the script's scope by its camelCase name. No import required.

Task Space

await listTaskSpaces()
const task = await useOrCreateTaskSpace('describe task')   // reuse or create
await completeTaskSpace(task.name)                         // done, keep the tab
await closeTaskSpace(task.name)                            // shut the space down

name should be a 3-to-6 word natural-language description of the task. Don't use placeholders.

await listTabs()
await openOrReuseTab(url, { wait: true, timeout: 20 })
await gotoAndWait(url, { timeout: 20, settle: 1 })
await newTab(url)
await switchTab(tabId)
await currentTab()
await pageInfo()
await ensureRealTab()        // a fresh task space may have no tab yet

Observation

await snapshotText()                              // full-page semantic snapshot (default)
await snapshotText({ scope: 'only_within_viewport' })
await captureScreenshot('result.png')
await drainEvents()                               // consume the nav / network event queue

Mouse and scroll

click, doubleClick, hover, and dragMouse accept the same target format (CSS pixels):

  • 'string': CSS selector or @ref. Clicks the element center.
  • [x, y] or {x, y}: viewport coordinates.
  • {selector, x, y}: relative offset from the element's top-left.
  • options.label: 3-to-6 word description. Pass it and the action triggers a visual highlight.
await click('@21', { label: 'check the login state' })
await click('button.primary', { label: 'click the submit button' })
await click([420, 260])
await hover('@5', { label: 'hover to reveal the menu' })
await dragMouse([from, to], { label: 'drag the card' })

await scrollBy(900)
await scroll({ dy: 900 })
await scrollToBottomUntil(
  async () => await js(String.raw`document.querySelectorAll('article').length`) >= 20,
  { step: 900, wait: 1, maxSteps: 20 },
)

Keyboard and input

await typeText('hello world')
await fillInput('@2', 'user@test.com')
await pressKey('Enter')
await dispatchKey({ ... })

Files and network

await uploadFile('input[type="file"]', '/absolute/path/to/file.pdf')
await httpGet('https://api.example.com/data')   // GET issued in the page's context

Waiting

await wait(1)                                    // seconds
await waitForLoad()
await waitForElement('@1')
await waitForNetworkIdle()

wait() and timeout are in seconds. Only parameters ending in Ms are milliseconds.

Browser execution

js(source) is Runtime.evaluate under the hood and takes a string. Don't pass it a function and arguments the way Puppeteer does — that produces a warning, gets wrapped in .toString(), and the closure variables and argument channel both vanish.

For multi-step logic, wrap it in an IIFE that returns once:

const data = await js(String.raw`(() => {
  const items = [...document.querySelectorAll('article')]
  return items.map(el => ({
    text: el.innerText,
    links: [...el.querySelectorAll('a')].map(a => a.href),
  }))
})()`)

await elementEval('@1', el => el.getBoundingClientRect())
await cdp('Page.captureScreenshot', { format: 'png' })

Output and self-discovery

cliLog(value)                  // the only output channel in a heredoc
cliLog(help('click'))          // look up a helper's usage

Start with snapshotText plus ref / loc — it keeps the semantics intact and avoids the brittleness of coordinates:

  1. Reuse or create the Task Space.
  2. Open or switch to the page (openOrReuseTab / gotoAndWait).
  3. snapshotText() to get the [ref=N, loc=..., url=...] tree. Refs get registered into the refMap automatically.
  4. Act on @N with click / fillInput / elementEval, or do a one-shot DOM extraction inside js(...).
  5. cliLog(...) the final result.

Other useful paths to combine:

  • captureScreenshot + click([x, y]): visual layouts, canvas-driven UIs, virtual lists, pages with incomplete accessibility.
  • js / elementEval / cdp: extract DOM directly, inspect browser state, or anything that doesn't fit a standard helper cleanly.

Keep navigation, observation, scrolling, extraction, filtering, aggregation, and output inside a single ego-browser nodejs heredoc. Don't pipe the data through a second local node script.

Ref scope

@N is only valid against the refMap of the most recent snapshotText. Every snapshotText() rebuilds the refMap. Ref numbers come from the element's CDP backendNodeId, so the same element usually carries the same number across snapshots — but for @N to be operable, N must appear in the most recent snapshot output.

Common causes of Unknown ref:

  • The element scrolled out of the viewport.
  • The DOM re-rendered.
  • A previous round used scope: 'only_within_viewport', and the next round didn't cover the element.

When you need a stable reference to the same element across several rounds, use the loc=... selector from the snapshot or write a CSS selector directly. This is also the basis for accumulated Experience — see Skills.

Skill workspace

ego-browser doesn't carry mutable agent experience on its own. By default it loads helper extensions and learned site experience from the repo's skill bundle:

../../skills/ego-browser

Override via env var:

EGO_BROWSER_AGENT_WORKSPACE=/path/to/ego-browser ego-browser nodejs <<'EOF'
cliLog(await siteSkills())
EOF

Site experience under learnings/ is always active; every helper call reads it. The write and discovery model for Experience is described in Skills.

Validate learned experience:

npm run validate:learnings

Directory layout

package/ego-browser/
├── src/                      # browser-runtime / helpers / run.js
│   ├── browser-runtime.js    # ego runtime bridge on the browser side
│   ├── helpers.js            # helpers exposed to the agent script
│   ├── run.js                # CLI entry point (executes stdin)
│   └── learning/             # experience index, domain check, format check
├── artifacts/ego-browser/    # build output; npm bin points here
└── test/                     # unit tests

skills/ego-browser/
├── SKILL.md / SKILL.zh.md    # entry point for the agent
└── learnings/                # site experience directory

Notes

  • snapshotText() defaults to scope: 'full_page'. Pass 'only_within_viewport' only when you really need just the visible area.
  • js() returns the evaluation result directly. Don't JSON.parse(...) it again.
  • When writing regex inside a js() template string, double the backslashes (\\d, \\s) or switch to String.raw.
  • A top-level return gets wrapped in an IIFE automatically. A return inside a nested callback can trigger that too, so write complex expressions as (() => { ... })() upfront.
  • When the user has explicitly asked for ego-browser, the runtime is ready. Don't preflight with which ego-browser / node -v / a help dump — only do that if a first run actually errors.