Updated on 2024-12-19 GMT+08:00

Processing XML

To process values of the XML data type, GaussDB (DWS) provides the xpath and xpath_exists functions, as well as the XMLTABLE table function.

xpath(xpath, xml [, nsarray])

Description: Returns an array of XML values corresponding to the set of nodes produced by the xpath expression. If the xpath expression returns a scalar value instead of a set of nodes, an array of individual elements is returned.

The second parameter xml must be a complete XML document, which must have a root node element.

The third parameter is optional and is an array mapping of a namespace. The array should be a two-dimensional text array, and the length of the second dimension should be 2. (It should be an array of arrays, each containing exactly two elements). The first element of each array item is the alias of the namespace name, and the second element is the namespace URI. The alias provided in this array does not have to be the same as the alias used in the XML document itself. In other words, in the context of both XML documents and xpath functions, aliases are local.

Return type: XML value array

Example:

1
2
3
4
5
SELECT xpath('/my:a/text()', '<my:a xmlns:my="http://example.com">test</my:a>', ARRAY[ARRAY['my', 'http://example.com']]);
 xpath
--------
 {test}
(1 row)

Process the default (anonymous) namespace:

1
2
3
4
5
SELECT xpath('//mydefns:b/text()', '<a xmlns="http://example.com"><b>test</b></a>',ARRAY[ARRAY['mydefns', 'http://example.com']]);
 xpath
--------
 {test}
(1 row)

xpath_exists(xpath, xml [, nsarray])

Description: The xpath_exists function is a special form of the xpath function. This function does not return an XML value that satisfies the xpath function; it returns a Boolean value indicating whether the query is satisfied. This function is equivalent to the standard XMLEXISTS predicate, but it also provides support for a namespace mapping parameter.

Return type: bool

Example:

1
2
3
4
5
SELECT xpath_exists('/my:a/text()', '<my:a xmlns:my="http://example.com">test</my:a>', ARRAY[ARRAY['my', 'http://example.com']]);
 xpath_exists
--------------
 t
(1 row)

xmltable

Description: Generates a table based on the input XML data, XPath expression, and column definition. An xmltable is similar to a function in syntax, but it can appear only as a table in the FROM clause of a query.

Return value: setof record

Syntax:

1
2
3
4
5
6
XMLTABLE ( [ XMLNAMESPACES ( namespace_uri AS namespace_name [,  ...] ), ]
                row_expression PASSING [ BY  { REF | VALUE } ]
document_expression [ BY  { REF | VALUE } ]
COLUMNS name  { type  [ PATH column_expression  ] [ DEFAULT default_expression ] [ NOT NULL | NULL ] | FOR ORDINALITY }
[, ...]
)

Parameter:

  • The optional XMLNAMESPACES clause is a comma-separated list of namespace definitions, where each namespace_uri is a text-type expression and each namespace_name is a simple identifier. XMLNAMESPACES specifies the XML namespaces used in the document and their aliases. The default namespace declaration is not supported.
  • The mandatory parameter row_expression is an XPath 1.0 expression. This expression calculates the sequence of XML nodes based on the provided XML document document_expression. The sequence is the sequence of converting xmltable to output lines. If the document_expression value is NULL or an empty node set generated by row_expression, no line is returned.
  • The document_expression parameter is used to input an XML document. The input document must be in the XML format. XML fragment data or XML documents in incorrect format are not accepted. The BY REF and BY VALUE clauses do not take effect. They are used only to implement SQL standard compatibility.
  • The COLUMNS clause specifies the column list definition in the output table. The column name and column data type are mandatory, and the path, default value, and whether the clause is empty are optional.
    • column_expression of a column is an XPath 1.0 expression used to calculate the value of the column extracted from the current row based on row_expression. If column_expression is not specified, the field name is used as an implicit path.
    • A column can be marked as NOT NULL. If column_expression in the NOT NULL column does not return any data, and there is no DEFAULT clause or the calculation result of default_expression is NULL, an error is reported.
    • The columns marked as FOR ORDINALITY are filled with row numbers starting from 1. The sequence is the node sequence retrieved from the row_expression result set. A maximum of one column can be marked as FOR ORDINALITY.

      XPath 1.0 does not specify the order for nodes, so the order in which results are returned depends on the order in which data is obtained.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT * FROM XMLTABLE('/ROWS/ROW'
PASSING '<ROWS><ROW id="1"><CITY_ID>1002a</CITY_ID><CITY_NAME>snowcity</CITY_NAME></ROW><ROW id="2"><CITY_ID>1003b</CITY_ID><CITY_NAME>icecity</CITY_NAME></ROW><ROW id="3"><CITY_ID>1004c</CITY_ID><CITY_NAME>windcity</CITY_NAME></ROW></ROWS>'
COLUMNS id INT PATH '@id',
ordinality FOR ORDINALITY,
CITY_id TEXT PATH 'CITY_ID',CITY_name TEXT PATH 'CITY_NAME' NOT NULL);
 id | ordinality | city_id | city_name
----+------------+---------+-----------
  1 |          1 | 1002a   | snowcity
  2 |          2 | 1003b   | icecity
  3 |          3 | 1004c   | windcity
(3 rows)