Skip to content

xml

pinky_core.xml

XML utilities for Snowflake SQL — UDF handlers and helpers.

Two UDF handlers are provided depending on the SQL context:

xml_path(node, path) Operates on a Snowflake VARIANT produced by PARSE_XML(). Pure Python, zero dependencies — can be inlined as AS $$ ... $$. Custom XPath-like engine covering the most common navigation patterns.

xml_path_str(xml_string, path) Operates on a raw XML string. Uses lxml (libxml2) for full XPath 1.0 support including namespaces. Requires PACKAGES = ('lxml') in the UDF DDL.

Deployment::

-- VARIANT input (inline, no packages)
CREATE OR REPLACE FUNCTION UDF_GET_XMLPATH("NODE" VARIANT, "PATH" VARCHAR)
RETURNS VARIANT LANGUAGE PYTHON RUNTIME_VERSION = '3.11'
HANDLER = 'xml_path' AS $$ <paste module source> $$;

-- String input (lxml, Anaconda channel)
CREATE OR REPLACE FUNCTION UDF_GET_XMLPATH_STR("XML" VARCHAR, "PATH" VARCHAR)
RETURNS VARIANT LANGUAGE PYTHON RUNTIME_VERSION = '3.11'
PACKAGES = ('lxml') HANDLER = 'xml_path_str'
AS $$ <paste module source> $$;

Usage in SQL::

-- From PARSE_XML output
SELECT UDF_GET_XMLPATH(PARSE_XML(xml_col), 'Cat_Summary/Name/text()')::VARCHAR
FROM my_table;

-- From raw XML string
SELECT UDF_GET_XMLPATH_STR(xml_col, '//Name/text()')::VARCHAR
FROM my_table;

xml_path(node, path)

Entry point registered as the Snowflake UDF handler.

Parameters:

Name Type Description Default
node Variant

VARIANT produced by Snowflake's PARSE_XML() or XMLGET().

required
path str | None

XPath-like expression string.

required

Returns:

Type Description
Variant

Matching VARIANT node/value, or None if not found.

Source code in src/pinky_core/xml.py
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def xml_path(node: Variant, path: str | None) -> Variant:
    """Entry point registered as the Snowflake UDF handler.

    Args:
        node: VARIANT produced by Snowflake's PARSE_XML() or XMLGET().
        path: XPath-like expression string.

    Returns:
        Matching VARIANT node/value, or None if not found.
    """
    if node is None or path is None:
        return None
    if "|" in path and not re.search(r"\[[^\]]*\|[^\]]*\]", path):
        for p in path.split("|"):
            r = _evaluate(node, p.strip())
            if r is not None:
                return r
        return None
    return _evaluate(node, path)

xml_path_str(xml_string, path)

XPath navigation on a raw XML string using lxml (libxml2).

Full XPath 1.0 support including namespaces, axes, and all standard functions. Requires lxml — available in the Snowflake Anaconda channel (PACKAGES = ('lxml')), no ARTIFACT_REPOSITORY needed.

Parameters:

Name Type Description Default
xml_string str | None

Raw XML string to parse.

required
path str | None

XPath 1.0 expression.

required

Returns:

Type Description
Variant

Text content, attribute value, serialised element, or numeric result.

Variant

None if not found, input is None, or the XML is malformed.

Source code in src/pinky_core/xml.py
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def xml_path_str(xml_string: str | None, path: str | None) -> Variant:
    """XPath navigation on a raw XML string using lxml (libxml2).

    Full XPath 1.0 support including namespaces, axes, and all standard
    functions. Requires ``lxml`` — available in the Snowflake Anaconda channel
    (``PACKAGES = ('lxml')``), no ``ARTIFACT_REPOSITORY`` needed.

    Args:
        xml_string: Raw XML string to parse.
        path: XPath 1.0 expression.

    Returns:
        Text content, attribute value, serialised element, or numeric result.
        None if not found, input is None, or the XML is malformed.
    """
    if xml_string is None or path is None:
        return None
    try:
        from lxml import etree  # lazy import — lxml only needed for this function

        root = etree.fromstring(xml_string.encode())
        results = root.xpath(path)
        # lxml returns scalars directly for numeric/string XPath expressions (count, string…)
        if isinstance(results, (int, float, str, bytes)):
            return results.decode() if isinstance(results, bytes) else results
        if not results:
            return None
        first = results[0]
        if isinstance(first, (int, float)):
            return first
        if isinstance(first, bytes):
            return first.decode()
        if isinstance(first, str):
            return first
        # Element node — return text content
        return first.text
    except Exception:
        return None