CLOUDCRUISE UNITED

LLM-only path

ExtractDatamodel — Nested Shadow DOM

Medication data lives inside a shadow root that is itself inside another shadow root (web component nesting). Standard XPath evaluation via fontoxpath cannot cross shadow boundaries. LLM_DOM extracts correctly if the snapshot serializer includes shadow content; otherwise LLM_VISION is required.

Why STATIC fails here
  • Shadow DOM creates an encapsulated subtree — XPath evaluation on document cannot reach nodes inside shadow roots
  • Double nesting (shadow → shadow) means even if level-1 is traversed, level-2 is still hidden
  • The April 27 snapshot improvements serialize open shadow content into the DOM string, so LLM_DOM can work
  • LLM_VISION always works — the content is rendered visually regardless of shadow encapsulation
Workflow node config
{
  "action": "EXTRACT_DATAMODEL",
  "parameters": {
    "execution": "LLM_DOM",
    "extract_data_model": {
      "type": "object",
      "properties": {
        "medications": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "rx_id":       { "type": "string" },
              "drug":        { "type": "string" },
              "prescriber":  { "type": "string" },
              "refills":     { "type": "number" },
              "last_filled": { "type": "string" }
            }
          }
        }
      }
    }
  }
}