CLOUDCRUISE UNITED

ORANGE path

ExtractDatamodel — LLM_DOM Fallback

STATIC EXTRACT_DATAMODEL reads from //table[@id='product-table']. Append ?broken=true to rename the id so the static XPath misses. If the node has enable_llm_dom_fallback: true and the backend has ENABLE_LLM_DOM_FALLBACK=true, the run retries with execution=LLM_DOM and extracts the same data straight from the page HTML.

Current table id:product-tableBreak selector
SKUNamePriceStatus
P-001Wireless Mouse$29.99In stock
P-002Mechanical Keyboard$149.50In stock
P-003USB-C Hub$39.00Out of stock
P-004Monitor Stand$79.99In stock
P-005Laptop Sleeve$24.50Out of stock
Workflow node config
{
  "action": "EXTRACT_DATAMODEL",
  "parameters": {
    "selector": "//table[@id='product-table']",
    "execution": "STATIC",
    "enable_llm_dom_fallback": true,
    "extract_data_model": {
      "type": "object",
      "properties": {
        "products": {
          "type": "array",
          "path": "//table[@id='product-table']//tbody/tr",
          "items": {
            "type": "object",
            "properties": {
              "sku":      { "type": "string", "path": ".//td[1]" },
              "name":     { "type": "string", "path": ".//td[2]" },
              "price":    { "type": "number", "path": ".//td[3]" },
              "in_stock": { "type": "string", "path": ".//td[4]" }
            }
          }
        }
      }
    }
  }
}
Test plan
  1. Backend env: ENABLE_LLM_DOM_FALLBACK=true
  2. Workflow flag: enable_xpath_recovery=true
  3. Node param: enable_llm_dom_fallback: true
  4. Run against this URL with no query — STATIC succeeds (control)
  5. Run against ?broken=true — STATIC fails, LLM_DOM rescues
  6. Verify: run completes, data extracted, healing-table row appears, logs show LLM_DOM fallback: STATIC extract_datamodel failed; retrying via LLM_DOM