CLOUDCRUISE UNITED

LLM-only path

ExtractDatamodel — Virtual / Lazy List

200 claims, but the DOM only renders ~20 rows at a time (virtual scrolling). STATIC extraction only sees the currently mounted rows — the rest don't exist in the DOM. This tests whether extraction correctly handles partial data and whether LLM_DOM / LLM_VISION can be combined with scrolling to capture the full dataset.

Total rows: 200In DOM: 15Scroll position: 0px
Claim IDPatientAmountDateStatus
CLM-10001Adams, John$4644.6704/01/2026Approved
CLM-10002Baker, Lisa$949.5404/02/2026Pending
CLM-10003Clark, Maria$4881.0704/03/2026Denied
CLM-10004Davis, Tom$4479.6604/04/2026In Review
CLM-10005Evans, Ruth$2065.5404/05/2026Approved
CLM-10006Ford, Nina$3167.0104/06/2026Pending
CLM-10007Grant, Paul$2972.7204/07/2026Denied
CLM-10008Hill, Sara$3303.9204/08/2026In Review
CLM-10009Irwin, Ray$3958.7904/09/2026Approved
CLM-10010Jones, Amy$1148.2104/10/2026Pending
CLM-10011Adams, John$4947.6504/11/2026Denied
CLM-10012Baker, Lisa$2746.9604/12/2026In Review
CLM-10013Clark, Maria$2254.1604/13/2026Approved
CLM-10014Davis, Tom$511.5004/14/2026Pending
CLM-10015Evans, Ruth$2077.1704/15/2026Denied
Why STATIC fails here
  • Virtual scrolling only renders rows in or near the viewport — most of the 200 rows never exist in the DOM simultaneously
  • XPath //tr[@data-claim-id] only matches ~20 rows at any given time
  • LLM_DOM has the same limitation — the serialized HTML only contains rendered rows
  • Full extraction requires a scroll+extract loop: scroll, extract visible portion, scroll again, merge results
  • LLM_VISION per scroll position + deduplication is the most reliable approach for these patterns
  • Real-world examples: AG Grid, React Virtualized, infinite-scroll appointment lists
Workflow node config (single-page extraction)
{
  "action": "EXTRACT_DATAMODEL",
  "parameters": {
    "execution": "LLM_VISION",
    "prompt": "Extract all visible claims from this table. Note: this is a virtual list — only visible rows are in the DOM.",
    "extract_data_model": {
      "type": "object",
      "properties": {
        "claims": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "claim_id": { "type": "string" },
              "patient":  { "type": "string" },
              "amount":   { "type": "string" },
              "date":     { "type": "string" },
              "status":   { "type": "string" }
            }
          }
        },
        "has_more_rows": { "type": "boolean" }
      }
    }
  }
}