CLOUDCRUISE UNITED

LLM-only path

ExtractDatamodel — Nested Iframes

Patient data lives inside an iframe that is itself inside another iframe. STATIC extraction only traverses one iframe level via getNonProtectedIframes, so XPath evaluation never reaches the inner content. LLM_DOM receives the fully serialized DOM (including nested iframe content) and extracts correctly.

Structure: page → outer iframe → inner iframe → table
Why STATIC fails here
  • evaluateXPathOnIframes only checks top-level iframes via getNonProtectedIframes(document)
  • The outer iframe's contentDocument is checked, but the inner iframe inside it is never traversed
  • The target table only exists inside the inner iframe — XPath returns null
  • LLM_DOM works because getFullDocumentWithIframes recursively serializes nested iframes into the DOM string sent to the backend
Workflow node config
{
  "action": "EXTRACT_DATAMODEL",
  "parameters": {
    "execution": "LLM_DOM",
    "extract_data_model": {
      "type": "object",
      "properties": {
        "patients": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "mrn":       { "type": "string" },
              "name":      { "type": "string" },
              "dob":       { "type": "string" },
              "diagnosis": { "type": "string" }
            }
          }
        }
      }
    }
  }
}