Test pages for browser automation challenges. Each page demonstrates different web features that can be tricky to automate.
Simple login form with username and password fields
Run the admin fingerprint collector from the demo site without logging in
Patient ID lookup form that rejects invalid IDs (for testing input_required recovery)
Missing expected element / precondition scenarios (no-result, conditional action, stalled post-state) that classify as UPSTREAM_ERROR / PREREQUISITE_NOT_MET β for the Fix-with-Builder-Agent button
Package tracking form with conditional popup for certain inputs
Drag-and-drop file uploader with metadata display
Controllable upload target for network-signal classification (?mode=sync|commit|async, ?status=, ?delay=, ?commitStatus=, ?verdict=). Issues a real browser POST so the network listener captures server-side upload failures (5xx/4xx/slow, upload-ok-but-commit-failed, async accept/reject).
Basic select dropdowns including one inside an iframe
Various select types: native, custom, searchable, multi-select, radio listbox, datalist
Password change forms (?variant=kinnser|availity|ecw|onehealthcare|reject-first|alphanumeric)
Standalone 2FA setup screens (?variant=dismissable|method-selection|authenticator|authenticator-qr-only|sms-verify|email-verify)
Stateful login β TFA setup β code entry β dashboard. After setup, subsequent logins ask for code only. Add ?reset=true to clear state (?variant=dismissable|authenticator|authenticator-qr-only)
Form with breakable selectors for XPath recovery propagation testing (?broken=true swaps selectors)
Button with drifting id (changes every page load) for testing the full healing promotion pipeline: AI_FALLBACK β XPath capture β writeback β static graduation
Button + input with per-variant attributes for XPath cascade tier testing (?variant=default|data-testid|id-attr|aria|placeholder|role|text-content|class-only|broken|heavy-dom|multi-action|shifted-attrs|table|delayed-render)
Full CRM dashboard: login (admin/password), cookie banner, sidebar nav, 15-row data table, modal contact form with 3-tab wizard, toast notifications, 1000+ DOM elements (?variant=default|logged-out|modal-open|cookie-banner|slow-load|404|duplicate-contacts, ?reset=true)
Healthcare claims portal: login (provider/claims123), patient search with shuffled columns, 4-step claim wizard, accordion sidebar, CAPTCHA overlay, version drift, session expiry (?variant=default|expired-session|captcha|maintenance|rate-limit|slow-results|deleted-patient|password-expired|force-logout, ?version=1|2, ?reset=true)
Full flow: login β password expired β dashboard with claim form. Tests end-to-end password update recovery.
Dense insurance claims form for enrichment testing
Two identical Submit buttons; an XPath selector matches both, triggering the deterministic multiple-matching classification under INCORRECT_FORM_INPUTS
DISMISSIBLE popup that auto-dismisses on a timer before the injected close-recovery click executes. Exercises the popup-gone no-op path: recovery should take no action and resume to the Continue button. ?dismiss_after=0 keeps it visible (recovery must actively close it); ?dismiss_after=<ms> tunes the timing.
Test opening popup windows and handling multiple windows
Simple popup window with close button
Success popup - target window in multi-window test
Trap popup - decoy window in multi-window test
15 scenarios for unexpected UI state: 1β4 DISMISSIBLE (cookie banner, newsletter, promo, survey), 5β12 DECISION_REQUIRED generic (duplicate record, destructive confirm, stale record, scheduling conflict, outstanding balance, coverage warning, duplicate submission, threshold exceeded), 13β15 DECISION_REQUIRED 1:1 prod reproductions (ECW code-already-exists Yes/No, Noridian Consent to Monitoring Accept/Reject, ECW Duplicate Patient Warning Proceed/Cancel)
Fully-fledged multi-PAGE EHR portal β real route navigation (login β dashboard β patients β chart β coding β review β done) with clickable sidebar nav, a patient results table with selectable rows, and chart tabs. ?seed=N carries across pages and deterministically scatters DISMISSIBLE / DECISION_REQUIRED popups at varied timings (on entry, after a delay, after the step's action). Knobs: ?density, ?only=decision|dismissible, ?popups=N. Login is never interrupted; every seed exercises β₯1 decision modal
1:1 ecwcloud.com β multi-step modal chain: duplicate-patient warning β code-already-exists; tests counter reset on verified dismissal between modals
1:1 essentials.availity.com β bottom-rectangle consent banner spanning full width with 4 buttons (Accept All / Reject All / Customize / Γ); tests non-centered modal placement
1:1 eprg.wellmed.net β full-screen overlay with scrollable legal text and a single 'I Acknowledge' CTA disabled until scroll-to-end; tests scroll-required single-CTA case
1:1 app.azaleahealth.com β bottom-right corner toast (not overlay) with 3 buttons (Yes / Maybe Later / Don't ask again); tests non-overlay corner placement
1:1 thespot.fcso.com β single-CTA modal whose click causes navigation to a /login flow; tests identifyRecoveryNode smart-resume after state-loss
1:1 app.propertymeld.com β wrong-modal (Edit Meld Residents) appears unexpectedly on Save Note; only Γ close button, no labeled CTAs; tests close-only dismissal path
1:1 www.portal.jnjwithme.com β single-CTA acknowledge modal whose action terminates the workflow as failed (patient already enrolled, no recovery path)
1:1 ecw.cmc-pa.com β information-dense modal with embedded error list inside the body, OK / Cancel buttons; tests information-rich modal extraction
1:1 app.weinfuse.com β feature-prompt modal triggered after file upload with 3 buttons (Scan Now / Maybe Later / Don't Show Again) and illustration; tests feature-prompt 3-option dismissal
1:1 familycarecenter.insynchcs.com β wrong-context modal (Zip Search) appearing when opening a gender dropdown; modal has a text-input field plus 2 buttons (V2 gap: modal_action MVP supports buttons only, this exposes the input-inside-modal limit)
Multiple levels of nested iframes with different sandbox attributes
Form elements inside iframe with security context display
Single-row table extraction inside an iframe. Use //tr[@data-row-id='row-2'] as the container XPath and short child XPaths like /td[@class='typeDesc'].
W3Schools-style result frame: createElement('iframe') + appendChild (about:blank) + document.write, recreated on each Run. Add ?delay=2000 to reproduce the slow-render injection race.
Frame served with script-src 'none' so the main-world inject is blocked; target is streamed in after ?delay ms. Realistic trigger for the isolated-world XPath fallback's early-bail.
Sandboxed iframe without allow-same-origin β opaque origin, so the parent can't read contentDocument (same as cross-origin). Forces the iframe relay path; regression guard for cross-origin find/click on a single host.
Embeds https://example.com β a real cross-origin frame that allows framing. contentDocument is null to the parent, so only the iframe relay reaches it. Regression guard for the relay path; target the 'Example Domain' heading.
Iframes mounted inside open and closed shadow roots β querySelectorAll('iframe') and a document-scoped MutationObserver don't cross shadow boundaries, so the frame may never be injected.
3+ levels of same-origin nesting with a click/type target in the deepest frame. Native (coordinate) actions must compose the frame offset across every level. ?level=N, ?delay=ms.
Scaled (transform: scale) and scrolled-container iframes β native click coordinates must account for the shifted real position of the frame.
Frame src holds the HTTP response for ?delay ms (default 3s), so the frame sits at about:blank through a real network wait before any content exists.
loading='lazy' iframe far below the fold β contentWindow stays about:blank until scrolled into view, so injection must handle a frame that exists long before it loads.
The same iframe element is renavigated/reloaded to a new document after injection; the extension must re-establish frameId and listeners on a frame it has already seen.
Click/type target inside a srcdoc frame (document URL is about:srcdoc, not about:blank) to exercise the marker/injection logic on that path.
Origin chain parent β cross-origin β back to parent's origin; the postMessage relay must hop across a cross-origin boundary and back to reach the innermost target.
Injects many iframes in one synchronous burst to stress the MutationObserver's 100ms debounce β checks none are dropped. ?count=N.
Removes the iframe and appends a fresh one reusing the same id (SPA route churn) β tests that stale frameId references and listeners are not reused.
Simulates chart notes structure with nested PDF viewer
Clinical notes form with embedded PDF iframe
Scroll to elements below the fold on a long page
Scrollable containers: log viewer, chat history, horizontal carousel
Terms and Conditions modal with scroll-to-bottom to enable OK button
Nested scrollable containers: sidebar + main content with inner scrollable areas
Content that loads on scroll (pagination trigger, lazy loading)
Scrollable iframes, sticky headers, and CSS scroll-snap
Scrollable dropdown option lists: custom ARIA, native select, hidden JCF-style
Collapsible sections that push content below the fold when expanded
Windowed list (only visible rows in DOM) and overflow:hidden clipped content
Scrollable containers inside shadow DOM and floating popover/tooltip
Smooth scroll, bidirectional grid, CSS transform positioning, keyboard tab focus
Wide strip requiring horizontal scroll to reveal off-screen columns (?layout=page|container). Live window.scrollX readout for verifying left/right scroll.
Target off-screen both down AND right β exercises to-element 2D auto-reveal (scrolls both axes). Live scrollX/scrollY readout.
Bordered container scrollable in both axes with a far-right target inside β exercises region find-element / full-container horizontal. Live container.scrollLeft readout.
Target near the top edge that can't be centered (page won't scroll above 0); loads scrolled to the bottom. Verifies scroll-to-element succeeds when the target is revealed but un-centerable at the start boundary.
Wide horizontal grid with a smaller vertical-only notes pane overlapping its center. A horizontal scroll must pick the grid, not the vertical decoy β verifies the builder's axis-filtered container selection. Live grid.scrollLeft + decoyPane.scrollTop readouts.
Progressive loading with overlay, skeletons, lazy content, and delayed dropdowns (?delay=X&stealth=true hides indicators)
Button appears after configurable delay (?delay=X seconds)
Whole page stays in a loading state for 20 seconds before rendering content
Shows a 503 Service Unavailable page on first paint, then swaps to a normal dashboard after ?dwell seconds β for testing that a stale outage frame in S1 history is not mis-classified as SERVICE_UNAVAILABLE (?dwell=, ?status=)
Configurable redirect loop with various redirect methods
Hover dropdowns, hover cards, and scroll-to-agree modal
Test downloading a dynamically generated PDF file
Main-frame navigations to PDF URLs (link, location.href, redirect, form submit) with inline/none/attachment dispositions
PDF opened via target=_blank, window.open, and popup windows with inline/none/attachment dispositions
PDFs as iframe src (sub_frame) with inline/none/attachment dispositions, plus JS-set src
PDFs via embed and object tags with inline/none/attachment dispositions
Client-side generated blob: and data: PDFs β navigation, new tab, iframe, embed, anchor download
Three distinct PDFs as iframes on one page β tests multi-file capture (upload all)
Single-use-token PDF (proves token-safe tshark capture) and a ~15MB PDF (large reassembly)
Configurable large file download (up to 500MB) for testing file handling
Tests 10 different download methods with a known binary file to diagnose corruption
Controllable download target for network-signal classification (?mode=sync|async, ?status=, ?delay=, ?drip=1&dripMs=, ?size=, ?redirect=). Triggers a real browser download so the network listener captures server-side download failures (404/500/503, slow/drip timeout, redirect-to-login, async export).
Printable certification page with 15 iframes. Use with trigger_print in a tab-churn loop to measure per-iteration degradation (?patient, ?cert_date, ?office, ?provider, ?delay)
Multi-step form with random popups and conditional fields
Display page showing tracking status (reached via /shipping)
Login with 2FA, then paginated claims table with lookup and EOB downloads
Tabbed medical portal (Claims, Eligibility, Auth) for testing enrichment classification
Stable id/class/position on every load. GREEN path β graduated xpath resolves cleanly.
Rotating button id; role/aria stable. YELLOW path β cascade recovers and fingerprint matches.
?variant=a|b swaps Submit and Cancel at the same position. Tests fingerprint-mismatch immediate revert.
?show=false removes the button entirely. Tests cascade-exhausted AI fallback and hard failure.
Stable input id/name/label for input_text graduation. GREEN path for the input_text action type.
Rotating input id/name, stable label/aria/testid. YELLOW path for input_text cascade recovery.
Five Submit buttons reshuffled on every load. Tests the observing/candidate β disqualified path.
?broken=true rotates a graduated button's id without removing the element. Tests post-graduation silent xpath failure β revert.
?broken=true renames the table id so STATIC extract_datamodel misses. With enable_llm_dom_fallback=true on the node and ENABLE_LLM_DOM_FALLBACK=true on the backend, the run rescues via LLM_DOM.
Patient table inside iframe-within-iframe. STATIC only traverses one iframe level; LLM_DOM with recursive serialization extracts correctly.
Claims data rendered on <canvas> (simulating PDF viewer). Zero DOM text nodes β only LLM_VISION can read the rendered pixels.
Medication table inside shadow root within shadow root. XPath cannot cross shadow boundaries; requires LLM_DOM with shadow serialization or LLM_VISION.
Schedule table renders after JS delay (?delay=5000). With ?randomize=true the container ID changes each load. STATIC fires against empty DOM; LLM_DOM captures the settled state.
Benefits data inside a cross-origin iframe. contentDocument is null; LLM_DOM cannot serialize it. Only LLM_VISION (screenshot compositing) works.
200 claims in a virtual-scrolling list β only ~20 rows exist in the DOM at once. Tests partial extraction and scroll+extract patterns.
Fires the same URL (/api/method-echo) with GET/POST/PUT/PATCH/DELETE, each returning a method-tagged payload. Verifies EXTRACT_NETWORK's HTTP-method filter selects the right response when one URL is hit by many methods. ?auto=1 fires all on load; ?methods=GET,POST limits which.
Fires a no-cors fetch to a Cloudflare endpoint that publishes DNS HTTPS records (Chrome reliably picks HTTP/3). Pair with EXTRACT_NETWORK on the same URL to verify whether our tshark-on-cc_tun0 capture stack sees HTTP/3 traffic. The trace response body contains an http= line revealing which protocol Chrome actually used. ?target=<url>, ?iter=N, ?auto=0.
One distinct, fixed-position 'Download Report' button with stable pixels. Click it via LLM_VISION: first run grounds + caches the crop, a repeat run re-locates it with no VLM (DIRECT). ?theme=dark flips colors but keeps structure β Sobel matching should still hit.
Two visually-identical blank input fields. A DIRECT crop cross-matches both, so the password field binds to its 'Password' label anchor + offset and verifies landing before clicking. ?filled=true prefills email to prove the label anchor stays stable.
A row of N identical 'Select' buttons (?count=N, default 5). None is visually unique, so the matcher's top-2 margin is ~0 and it must abstain and defer to the VLM rather than guess (precision over recall).
Same fixed slot, different element by query: ?state=a blue 'Submit' (default), ?state=b green 'Cancel', ?show=false nothing. Cache on a, re-run on b β the cached crop should fail structural match and abstain, not mis-click.
Four-step sequential flow (Start β Add Details β Review β Submit), each step revealing the next distinct LLM_VISION target. Chains four vision clicks into four independent cache entries; a repeat run should produce four consecutive L1 hits.
Eval substrate for the dynamic-intent guard. ?case=<id> renders one realistic widget with a single target (data-testid="target"). 15 dynamic targets that must NOT cache (today's date, latest order, default card, primary contact, recommended plan, cheapest/oldest, next-available slot, selected/unread/new/your-task/pending, top result, highest bid) + 6 fixed controls whose labels merely contain a relative word and MUST stay cacheable (Today button, Sort-Newest toggle, Default-View tab, Primary tab, New button, Featured filter).
Pixel-identical button whose onClick is rewired to a destructive action on run 2 (?variant=build|attack). Cache is purely visual β predicted FALSE HIT (confident wrong click).
Safe button cached at a slot; on run 2 a pixel-identical destructive decoy takes that slot while the real button moves out of search_radius=240 (?variant=build|attack). Predicted FALSE HIT.
Approve $50 β $5000: meaning flips but Sobel structure stays >accept_score=0.85 (?variant=build|attack, ?digits=long stresses the boundary). Predicted FALSE HIT (long β safe abstain).
Featureless field anchored on a distant heading; run 2 inserts an identical field between them so heading+offset drifts onto the twin and loose verify passes (?variant=build|attack). FALSE HIT on the anchored path (Obs 15).
Forces an L1 miss with a transient decoy present so the unconditional build() overwrites the good entry with bad pixels/coords (?variant=build|miss-poison|attack). Predicted durable POISON β FALSE HIT.
DIRECT locate uses scales=[1.0] only; run 2 enlarges the real target out of band and plants a same-1.0x decoy in-window (?variant=build|attack, ?zoom=, ?decoy=off). FALSE HIT (decoy on) / SAFE ABSTAIN (decoy off).
No per-field labels, so the featureless box is forced to anchor on a distant 'Billing' heading that can't co-move; run 2 inserts an identical box between them so heading+offset drifts onto the twin and loose verify passes (?variant=build|attack). Predicted FALSE HIT β the genuine anchored-drift trigger #4 missed.
Structure-preserving relabel on a fixed-geometry, same-color button: ConfirmβCancel or Pay $50βΒ£50 (?variant=build|attack, ?case=confirm-cancel|currency). Sobel stays >accept_score=0.85 so the cache HITS the meaning-flipped button β the case #3 only abstained on by luck. Predicted FALSE HIT; motivates an OCR/CLIP semantic veto.
Target invoice row sits below the fold β a grounding/CLICK node must scroll the container until it's visible, then act. ?scroll=inner|page, ?count=N (vary target depth between build/replay), ?target=N, ?present=false (remove target β end-of-scroll FAIL). OCR-friendly unique row ids for the cold OCR-gated pass.
Icon-only cold-path weak spot: a distinctive star badge below the fold among repeated low-distinctiveness filler glyphs. No text β cold OCR pre-filter can't help; warm pass relocates the cached glyph. ?scroll=inner|page, ?count=N, ?target=N, ?present=false.
Target lives in a scrollable side panel (not the page), off-center so it doesn't sit under the agent's fixed blind-scroll cursor β tests whether scroll-to-target works for paneled UIs. ?side=left|right, ?count=N, ?target=N, ?present=false.
People names (non-sequential, fixed array) in a small centered overflow-y-auto box β removes the ordinal cue invoice numbers gave the grounder, to test whether it can scroll to a target it can't locate by number. ?count=N, ?target=N, ?present=false.
Names in a WIDE fixed-height scrollable container that spans past both observed scroll aim points (x~495, x~960) so the wheel reliably hits it β isolates the numbers-vs-names scroll-inference question without the small-box physical-miss confound. ?count=N, ?target=N, ?present=false.
NextGen-style modal: form fields + OK/Cancel/Clear/Search and a SMALL scrollable results grid (own scrollbar) in the lower portion, target location row below its fold. Reproduces the customer pattern where the wheel must land inside a small off-center sub-region. ?offset=left|center|right, ?count=N, ?target=N, ?present=false.
Part of the CloudCruise browser automation platform