Auto-run many browser tests in parallel with your AI agent
Playwright MCP drives the browser one tool call at a time: slow and heavy on tokens. Now Claude Code or Codex runs the whole test in ego (lite) with a few lines of code: 2.4x faster, and runs many tests in parallel, each in its own Space.
Just tell your agent what to check
Describe the check in plain language and let it run. In this video the product pages of a real storefront render broken, and the agent gets one ask: find every visual problem and fix it.
The same page goes from broken to right, on screen
The product pages render broken. The agent looks at them in ego (lite), fixes the CSS in the source, and the open page re-renders correct on screen.
Same AI agent, same task, finished faster
Playwright MCP runs one tool call at a time: each next step waits for the model to read the last result. ego (lite) runs many steps in one pass of JS code, so it finishes faster and uses fewer tokens.
| ego (lite) | Playwright MCP | Chrome DevTools MCP | |
|---|---|---|---|
| How the agent drives it | with /ego-browser the skill | External MCP server. One tool call per action. | External MCP server. One tool call per action. |
| Setup | Install once, works with any agent. | npx install, launch modes to configure, frequent launch and Chrome-discovery errors. | Remote debugging, user-data-dir, cross-host setup. |
| Token use | Low. No MCP tool schemas, and only what the agent logs enters context: one snapshot per pass, not one after every action. | High. Users report 6x token growth and 200K-token context overflows. | High. Screenshots are token-expensive. |
| Login state | Your own logged-in browser. Same as the login state of your Chrome | Isolated browser by default; carrying logins means extension mode or storage-state setup. | Can attach, but grabs the tabs you're using. |
| iframe / shadow DOM / SDK widgets | Its page snapshot is built into the rendering engine, so it reads inside all of them. | Accessibility snapshot has an iframe blind spot. | Raw DOM, you handle it yourself. |
| Console / network / trace | Console, network, and traces, read straight from Chrome DevTools data. | Console, network, screenshots. | Deepest: Lighthouse, performance, memory. |
| Locator stability | Anchors to semantic labels, survives class changes. | Refs / accessibility, fairly stable. | Raw DOM selectors. |
| CAPTCHA / bot detection | Real human session, least likely to trip it. | Separate / headless browser, often flagged. | Better when attached, but grabs your tabs. |
| Tasks in parallel | Born to support many tasks running at once via Spaces. | One session per server; parallel runs mean managing extra isolated instances. | Single instance, large tab counts crash it. |
The friction was never the browser. It was the layer in between.
Across developer forums and issue trackers, complaints are most about the MCP, It’s the layer between your agent and the browser: the connection drops, the approvals pile up, the session breaks, the tokens balloon. ego (lite) has everything setup, so no more weird errors.
The browser works. The connection to it doesn’t.
- Connection:the tools show up but the agent won’t call them, runs stall on “waiting for approval,” and the same server behaves differently in each client
- Approvals:a popup per tool call, often 10 to 30 in one task, even with “run everything” on
- Reliability: a task that half-finishes, or a navigation that already succeeded yet still times out
Nothing in between: ego (lite) is the browser itself
- Stable by design: the agent talks straight to a local browser it owns, with no MCP transport in the middle to drop or stall
- Fewer approvals: one pass runs many steps instead of one tool call per action
- Predictable: drop-in with any AI agent, the same way every time
No test scripts. No selectors to maintain.
Playwright testing, Cypress testing, Selenium: you write the test, you own the selectors, and a class rename or a reshuffled layout breaks them. ego (lite) takes a plain-language check and locates elements by what they mean, so a UI change doesn’t break the test.
You write the spec, you maintain the selectors
- Hand-write and version a
.specfor every flow you want covered - Pin selectors like
page.click(".btn-x7f3")that break the next time the UI changes - Add waits and retries to fight flakiness yourself
- Build login automation or storage-state files to test behind a sign-in
- Re-run and repair the suite after every redesign
Just say what to check
- No spec to write: tell your agent the check in one sentence
- Locates elements by their meaning, not a CSS class, so a rename or reshuffle doesn’t break it
- Smart waits handle async content, no hand-rolled sleeps
- Tests behind your sign-in with no login automation
- Need visual regression testing? The agent diffs the screenshots
For a versioned, headless suite that runs unattended in CI, or for testing across Firefox and WebKit, keep Playwright. ego (lite) is the fast, zero-script inner loop while you build.
This is how ego lite works
A simple check, open the store, add a product, make sure checkout renders, is one call. The steps run back to back inside the browser, the log streams out as they land, and the agent reads the result once.
It handles what usually makes browser tests flaky
We ran the scenarios developers say are hardest to automate. With ego (lite), agents handled all of them. Its semantic snapshot is generated inside the Chromium rendering engine itself, so it sees into shadow roots and cross-document iframes where injected scripts go blind.
Shadow DOM
Acts inside web-component shadow roots, where injected scripts go blind.
Payment iframes
Fills card fields inside cross-document iframes, like a Stripe checkout.
Custom date pickers
Picks a date range in a custom calendar with no test ids.
Drag and drop
Real HTML5 drag: moves a kanban card, the board follows.
Infinite scroll
Scrolls a lazy feed until every item is loaded, no hand-tuned waits.
Paginated tables
Walks every page of an admin table and pulls each row.
Conditional modals
Cookie banners and dialogs that pop up mid-flow get handled, not tripped over.
Autocomplete
Types an address, waits for the suggestions, picks the right one.
And a click that lands on a covered or disabled button is reported as a failure, not a false pass, the same way a real test framework treats it.
Your logged-in browser, still under your control
Letting an agent loose in the browser you’re signed in to is a fair thing to pause on. Here is where the boundaries are.
It works in its own Space
The agent runs in a separate workspace inside the browser. It shares your logins, but your tabs and windows stay untouched.
Every step is visible
A real, headed browser on your Mac, not a hidden headless process. Watch the run live, and interrupt it from your agent’s CLI at any point.
Nothing leaves your Mac
History, cookies, and sessions stay on your computer. ego (lite)doesn’t upload them.
And when a flow has real side effects, a checkout that charges or a delete that deletes, point the agent at localhost or staging, the same way you’d test it by hand.
How it works, step by step
Web app testing in three steps: point your agent at localhost or any URL, say what to check, and read back what failed.
A Chromium browser you use daily. One click imports your Chrome logins, so the agent tests as you.
What should it test?
/ego-browserOpen localhost:3000, go through checkout, and tell me what’s broken
Type /ego-browser and just say what to check, in any language your agent understands.
| Page | Result |
|---|---|
| /checkout | OK |
| /product/gift-card | console error |
| /product/camera | image 404 |
Each broken page comes back with the console error or failed request that explains it, not just a pass or fail.
Where ego (lite)fits, and where it doesn’t
Built for the fast inner loop: reproduce, debug, and verify while you develop.
What ego lite is great at
The inner loop, in a real browser.
- Reproduce and debug a bug in a real, logged-in browser
- Test flows behind a sign-in with no login automation
- Read console errors, failed requests, and page state
- Run exploratory checks across many pages in parallel
- Verify a fix in the same session you found the bug
When to reach for Playwright
Honest about the boundary.
- Headless regression suites that run unattended in CI (ego lite is a headed Mac browser)
- Testing across Firefox and WebKit (ego lite is Chromium)
- A versioned, deterministic test suite you commit and own
Give your agent a real browser to test in
Download ego (lite) for MacFAQ
No. You bring an AI agent you already use, such as Claude Code, OpenAI Codex, or Cursor, and tell it what to check in plain language. The agent writes and runs the browser steps for you. There are no .spec files to maintain and no CSS selectors to hand-write, because ego lite locates elements from a semantic snapshot of the page.
Playwright MCP is an MCP server your agent drives one tool call per action, so a fifteen-step task means fifteen round trips to the model, which is slow and burns tokens. ego lite is the browser itself: the agent writes one short JavaScript pass that runs many steps at once, on your real logged-in browser. In our test on the same task in Claude Code with Opus 4.8, ego lite finished in about 18 seconds with 2 model round trips versus about 43 seconds and 9 round trips for Playwright MCP. It also runs many test tasks in parallel, each in its own Space, which a single MCP browser cannot.
Yes, and this is where it shines. ego lite is your own daily browser, already signed in, so the agent tests dashboards, internal admin panels, and account flows without you first building a login automation. Because it behaves like a real human session, it is also far less likely to trip CAPTCHAs or bot detection than a headless or freshly launched browser.
Each task runs in its own Space, a separate workspace inside the same browser, like extra windows of the same profile. Spaces share your logins but keep their own pages, so parallel runs don't collide with each other or with the tabs you're working in. To a website it looks like one signed-in user with several tabs open. If your app allows only one active session per account, run those flows one at a time or use a test account.
Yes, in the sense that the agent captures a baseline screenshot and a new one, diffs them pixel by pixel, and tells you what changed and where. It is the agent doing the diff in your browser, not a separate visual-testing product to set up, so there is no Percy or Chromatic to wire in. For a managed baseline service with review workflows and CI gating, a dedicated visual-testing tool is still the better fit.
Not today. ego lite is a real, headed browser on your Mac, built for testing while you develop: reproduce a bug, debug it, verify the fix, run checks across your pages. For a headless suite that runs unattended in CI, or for testing across Firefox and WebKit, keep your Playwright testing suite. ego lite is the fast inner loop, not the CI runner.
Functional and runtime issues a real user would hit: a flow that breaks, a button that does nothing, an uncaught JavaScript error, a failed network request, a console error, a value that comes out wrong. The agent reads the page state plus the console and network, so it reports what failed and where, not just a green or red.
The agent works in its own Space, so your tabs and windows stay untouched, and everything happens in a real, visible browser window you can watch and interrupt from your agent's CLI at any time. Your browsing data stays on your own computer: ego lite doesn't upload your history, cookies, or sessions. For flows with real side effects, like a checkout that actually charges, point the agent at localhost or staging, the same way you'd test by hand. If you connect an external agent or model provider, that provider has its own data policy.