ego-browser
The browser automation runtime AI agents use to drive ego lite's real Chromium session.
ego-browser is the browser automation runtime ego lite ships for AI agents. It speaks the Chrome DevTools Protocol to the real Chromium session inside ego lite and takes a Node.js heredoc script as its entry point: the agent writes the whole JS flow in a single stdin delivery, all helpers are pre-injected into the script's scope, and the browser state lives on inside a Space.
ego-browser is not meant for humans to drive a browser by hand, and it isn't a replacement for Playwright or Puppeteer. The intended reader is an LLM agent.
Who it's for
- AI coding agents that need to drive a browser: Claude Code, Codex, Cursor, custom SDK agents.
- Teams building vertical agents — automating Lark, Google Docs, Salesforce, and similar back offices.
- Repeating fixed web flows: login, fill, export, search, read tables.
- Anyone who has tried to stuff a full DOM or page of HTML into an LLM and hit the token wall.
Install
It comes with ego lite — see Quick start. After install, run ego-browser from any directory.
You can also install the skill standalone:
npx skills add github:CitroLabs/ego-lite/skills/ego-browser
Core loop
The typical rhythm for agents driving a page — everything in a single heredoc:
ego-browser nodejs <<'EOF'
const task = await useOrCreateTaskSpace('search github issues')
await openOrReuseTab('https://github.com/issues', { wait: true, timeout: 20 })
cliLog(await snapshotText())
EOF
- Reuse or create a Task Space (declare it in every heredoc — see Space).
- Open the target page.
- Read the snapshot (
snapshotText()) to get a semantic tree with[ref=N, loc=..., url=...]. - Act on the page by
@Nref or CSS selector. - Print the final result with
cliLog(...).
Inside the heredoc you're in a Node.js process; inside
js(...)you're in the page context. Don't mix them.
Helper reference
Every helper is available in the script's scope by its camelCase name. No import required.
Task Space
await listTaskSpaces()
const task = await useOrCreateTaskSpace('describe task') // reuse or create
await completeTaskSpace(task.name) // done, keep the tab
await closeTaskSpace(task.name) // shut the space down
name should be a 3-to-6 word natural-language description of the task. Don't use placeholders.
Navigation and state
await listTabs()
await openOrReuseTab(url, { wait: true, timeout: 20 })
await gotoAndWait(url, { timeout: 20, settle: 1 })
await newTab(url)
await switchTab(tabId)
await currentTab()
await pageInfo()
await ensureRealTab() // a fresh task space may have no tab yet
Observation
await snapshotText() // full-page semantic snapshot (default)
await snapshotText({ scope: 'only_within_viewport' })
await captureScreenshot('result.png')
await drainEvents() // consume the nav / network event queue
Mouse and scroll
click, doubleClick, hover, and dragMouse accept the same target format (CSS pixels):
'string': CSS selector or@ref. Clicks the element center.[x, y]or{x, y}: viewport coordinates.{selector, x, y}: relative offset from the element's top-left.options.label: 3-to-6 word description. Pass it and the action triggers a visual highlight.
await click('@21', { label: 'check the login state' })
await click('button.primary', { label: 'click the submit button' })
await click([420, 260])
await hover('@5', { label: 'hover to reveal the menu' })
await dragMouse([from, to], { label: 'drag the card' })
await scrollBy(900)
await scroll({ dy: 900 })
await scrollToBottomUntil(
async () => await js(String.raw`document.querySelectorAll('article').length`) >= 20,
{ step: 900, wait: 1, maxSteps: 20 },
)
Keyboard and input
await typeText('hello world')
await fillInput('@2', 'user@test.com')
await pressKey('Enter')
await dispatchKey({ ... })
Files and network
await uploadFile('input[type="file"]', '/absolute/path/to/file.pdf')
await httpGet('https://api.example.com/data') // GET issued in the page's context
Waiting
await wait(1) // seconds
await waitForLoad()
await waitForElement('@1')
await waitForNetworkIdle()
wait()andtimeoutare in seconds. Only parameters ending inMsare milliseconds.
Browser execution
js(source) is Runtime.evaluate under the hood and takes a string. Don't pass it a function and arguments the way Puppeteer does — that produces a warning, gets wrapped in .toString(), and the closure variables and argument channel both vanish.
For multi-step logic, wrap it in an IIFE that returns once:
const data = await js(String.raw`(() => {
const items = [...document.querySelectorAll('article')]
return items.map(el => ({
text: el.innerText,
links: [...el.querySelectorAll('a')].map(a => a.href),
}))
})()`)
await elementEval('@1', el => el.getBoundingClientRect())
await cdp('Page.captureScreenshot', { format: 'png' })
Output and self-discovery
cliLog(value) // the only output channel in a heredoc
cliLog(help('click')) // look up a helper's usage
Recommended workflow
Start with snapshotText plus ref / loc — it keeps the semantics intact and avoids the brittleness of coordinates:
- Reuse or create the Task Space.
- Open or switch to the page (
openOrReuseTab/gotoAndWait). snapshotText()to get the[ref=N, loc=..., url=...]tree. Refs get registered into the refMap automatically.- Act on
@Nwithclick/fillInput/elementEval, or do a one-shot DOM extraction insidejs(...). cliLog(...)the final result.
Other useful paths to combine:
captureScreenshot+click([x, y]): visual layouts, canvas-driven UIs, virtual lists, pages with incomplete accessibility.js/elementEval/cdp: extract DOM directly, inspect browser state, or anything that doesn't fit a standard helper cleanly.
Keep navigation, observation, scrolling, extraction, filtering, aggregation, and output inside a single
ego-browser nodejsheredoc. Don't pipe the data through a second localnodescript.
Ref scope
@N is only valid against the refMap of the most recent snapshotText. Every snapshotText() rebuilds the refMap. Ref numbers come from the element's CDP backendNodeId, so the same element usually carries the same number across snapshots — but for @N to be operable, N must appear in the most recent snapshot output.
Common causes of Unknown ref:
- The element scrolled out of the viewport.
- The DOM re-rendered.
- A previous round used
scope: 'only_within_viewport', and the next round didn't cover the element.
When you need a stable reference to the same element across several rounds, use the loc=... selector from the snapshot or write a CSS selector directly. This is also the basis for accumulated Experience — see Skills.
Skill workspace
ego-browser doesn't carry mutable agent experience on its own. By default it loads helper extensions and learned site experience from the repo's skill bundle:
../../skills/ego-browser
Override via env var:
EGO_BROWSER_AGENT_WORKSPACE=/path/to/ego-browser ego-browser nodejs <<'EOF'
cliLog(await siteSkills())
EOF
Site experience under learnings/ is always active; every helper call reads it. The write and discovery model for Experience is described in Skills.
Validate learned experience:
npm run validate:learnings
Directory layout
package/ego-browser/
├── src/ # browser-runtime / helpers / run.js
│ ├── browser-runtime.js # ego runtime bridge on the browser side
│ ├── helpers.js # helpers exposed to the agent script
│ ├── run.js # CLI entry point (executes stdin)
│ └── learning/ # experience index, domain check, format check
├── artifacts/ego-browser/ # build output; npm bin points here
└── test/ # unit tests
skills/ego-browser/
├── SKILL.md / SKILL.zh.md # entry point for the agent
└── learnings/ # site experience directory
Notes
snapshotText()defaults toscope: 'full_page'. Pass'only_within_viewport'only when you really need just the visible area.js()returns the evaluation result directly. Don'tJSON.parse(...)it again.- When writing regex inside a
js()template string, double the backslashes (\\d,\\s) or switch toString.raw. - A top-level
returngets wrapped in an IIFE automatically. Areturninside a nested callback can trigger that too, so write complex expressions as(() => { ... })()upfront. - When the user has explicitly asked for ego-browser, the runtime is ready. Don't preflight with
which ego-browser/node -v/ a help dump — only do that if a first run actually errors.