Updated on 2023-03-21 GMT+08:00

String Functions

Table 1 String Functions

Function

Return Type

Description

string1 || string2

STRING

Returns the concatenation of string1 and string2.

CHAR_LENGTH(string)

CHARACTER_LENGTH(string)

INT

Returns the number of characters in the string.

UPPER(string)

STRING

Returns the string in uppercase.

LOWER(string)

STRING

Returns the string in lowercase.

POSITION(string1 IN string2)

INT

Returns the position (start from 1) of the first occurrence of string1 in string2; returns 0 if string1 cannot be found in string2.

TRIM([ BOTH | LEADING | TRAILING ] string1 FROM string2)

STRING

Returns a string that removes leading and/or trailing characters string2 from string1.

LTRIM(string)

STRING

Returns a string that removes the left whitespaces from the specified string.

For example, LTRIM(' This is a test String.') returns "This is a test String.".

RTRIM(string)

STRING

Returns a string that removes the right whitespaces from the specified string.

For example, RTRIM('This is a test String. ') returns "This is a test String.".

REPEAT(string, integer)

STRING

Returns a string that repeats the base string integer times.

For example, REPEAT('This is a test String.', 2) returns "This is a test String.This is a test String.".

REGEXP_REPLACE(string1, string2, string3)

STRING

Returns a string from string1 with all the substrings that match a regular expression string2 consecutively being replaced with string3.

For example, REGEXP_REPLACE('foobar', 'oo|ar', '') returns "fb".

REGEXP_REPLACE('ab\ab', '\\', 'e') returns "abeab".

OVERLAY(string1 PLACING string2 FROM integer1 [ FOR integer2 ])

STRING

Returns a string that replaces integer2 characters of STRING1 with STRING2 from position integer1.

The default value of integer2 is the length of string2.

For example, OVERLAY('This is an old string' PLACING ' new' FROM 10 FOR 5) returns "This is a new string".

SUBSTRING(string FROM integer1 [ FOR integer2 ])

STRING

Returns a substring of the specified string starting from position integer1 with length integer2 (to the end by default). If integer2 is not configured, the substring from integer1 to the end is returned by default.

REPLACE(string1, string2, string3)

STRING

Returns a new string which replaces all the occurrences of string2 with string3 (non-overlapping) from string1.

For example, REPLACE('hello world', 'world', 'flink') returns "hello flink"; REPLACE('ababab', 'abab', 'z') returns "zab".

REPLACE('ab\\ab', '\\', 'e') returns "abeab".

REGEXP_EXTRACT(string1, string2[, integer])

STRING

Returns a string from string1 which extracted with a specified regular expression string2 and a regex match group index integer.

Returns NULL, if the parameter is NULL or the regular expression is invalid.

For example, REGEXP_EXTRACT('foothebar', 'foo(.*?)(bar)', 2)" returns "bar".

INITCAP(string)

STRING

Returns a new form of STRING with the first character of each word converted to uppercase and the rest characters to lowercase.

CONCAT(string1, string2,...)

STRING

Returns a string that concatenates string1, string2, ….

For example, CONCAT('AA', 'BB', 'CC') returns "AABBCC".

CONCAT_WS(string1, string2, string3,...)

STRING

Returns a string that concatenates string2, string3, … with a separator string1. The separator is added between the strings to be concatenated. Returns NULL if string1 is NULL. If other arguments are NULL, this function automatically skips NULL arguments.

For example, CONCAT_WS('~', 'AA', NULL, 'BB', '', 'CC') returns "AA~BB~~CC".

LPAD(string1, integer, string2)

STRING

Returns a new string from string1 left-padded with string2 to a length of integer characters.

If any argument is NULL, NULL is returned.

If integer is negative, NULL is returned.

If the length of string1 is shorter than integer, returns string1 shortened to integer characters.

For example, LPAD(Symbol,4,Symbol) returns "Symbol hi".

LPAD('hi',1,'??') returns "h".

RPAD(string1, integer, string2)

STRING

Returns a new string from string1 right-padded with string2 to a length of integer characters.

If any argument is NULL, NULL is returned.

If integer is negative, NULL is returned.

If the length of string1 is shorter than integer, returns string1 shortened to integer characters.

For example, RPAD('hi',4,'??') returns "hi??".

RPAD('hi',1,'??') returns "h".

FROM_BASE64(string)

STRING

Returns the base64-decoded result from string.

Returns NULL if string is NULL.

For example, FROM_BASE64('aGVsbG8gd29ybGQ=') returns "hello world".

TO_BASE64(string)

STRING

Returns the base64-encoded result from string; f string is NULL.

Returns NULL if string is NULL.

For example, TO_BASE64(hello world) returns "aGVsbG8gd29ybGQ=".

ASCII(string)

INT

Returns the numeric value of the first character of string.

Returns NULL if string is NULL.

For example, ascii('abc') returns 97.

ascii(CAST(NULL AS VARCHAR)) returns NULL.

CHR(integer)

STRING

Returns the ASCII character having the binary equivalent to integer.

If integer is larger than 255, we will get the modulus of integer divided by 255 first, and returns CHR of the modulus.

Returns NULL if integer is NULL.

chr(97) returns a.

chr(353) Return a.

DECODE(binary, string)

STRING

Decodes the first argument into a String using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').

If either argument is NULL, the result will also be NULL.

ENCODE(strinh1, string2)

STRING

Encodes the string1 into a BINARY using the provided string2 character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').

If either argument is NULL, the result will also be NULL.

INSTR(string1, string2)

INT

Returns the position of the first occurrence of string2 in string1.

Returns NULL if any argument is NULL.

LEFT(string, integer)

STRING

Returns the leftmost integer characters from the string.

Returns EMPTY String if integer is negative.

Returns NULL if any argument is NULL.

RIGHT(string, integer)

STRING

Returns the rightmost integer characters from the string.

Returns EMPTY String if integer is negative.

Returns NULL if any argument is NULL.

LOCATE(string1, string2[, integer])

INT

Returns the position of the first occurrence of string1 in string2 after position integer.

Returns 0 if not found.

The value of integer defaults to 0.

Returns NULL if any argument is NULL.

PARSE_URL(string1, string2[, string3])

STRING

Returns the specified part from the URL.

Valid values for string2 include 'HOST', 'PATH', 'QUERY', 'REF', 'PROTOCOL', 'AUTHORITY', 'FILE', and 'USERINFO'.

Returns NULL if any argument is NULL.

If string2 is QUERY, the key in QUERY can be specified as string3.

Example:

The parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') returns 'facebook.com'.

parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') returns 'v1'.

REGEXP(string1, string2)

BOOLEAN

Performs a regular expression search on the specified string and returns a BOOLEAN value indicating whether the specified match pattern is found. If it is found, TRUE is returned. string1 indicates the specified string, and string2 indicates the regular expression.

Returns NULL if any argument is NULL.

REVERSE(string)

STRING

Returns the reversed string.

Returns NULL if any argument is NULL.

NOTE:

Note that backquotes must be added to this function, for example, `REVERSE`.

SPLIT_INDEX(string1, string2, integer1)

STRING

Splits string1 by the delimiter string2, returns the integerth (zero-based) string of the split strings. Returns NULL if integer is negative.

Returns NULL if integer is negative.

Returns NULL if any argument is NULL.

STR_TO_MAP(string1[, string2, string3]])

MAP

Returns a map after splitting the string1 into key/value pairs using delimiters.

The default value of string2 is ','.

The default value of string3 is '='.

SUBSTR(string[, integer1[, integer2]])

STRING

Returns a substring of string starting from position integer1 with length integer2.

If integer2 is not specified, the string is truncated to the end.

JSON_VAL(STRING json_string, STRING json_path)

STRING

Returns the value of the specified json_path from the json_string. For details about how to use the functions, see JSON_VAL Function.

NOTE:

The following rules are listed in descending order of priority.

  1. The two arguments json_string and json_path cannot be NULL.
  2. The value of json_string must be a valid JSON string. Otherwise, the function returns NULL.
  3. If json_string is an empty string, the function returns an empty string.
  4. If json_path is an empty string or the path does not exist, the function returns NULL.

JSON_VAL Function

  • Syntax
STRING JSON_VAL(STRING json_string, STRING json_path)
Table 2 Parameters

Parameter

Data Types

Description

json_string

STRING

JSON object to be parsed

json_path

STRING

Path expression for parsing the JSON string For the supported expressions, see Table 3.

Table 3 Expressions supported

Expression

Description

$

Root node in the path

[]

Access array elements

*

Array wildcard

.

Access child elements

  • Example
    1. Test input data.
      Test the data source kafka. The message content is as follows:
      {name:James,age:24,sex:male,grade:{math:95,science:[80,85],english:100}}
      {name:James,age:24,sex:male,grade:{math:95,science:[80,85],english:100}]
    2. Use JSON_VAL in SQL statements.
      CREATE TABLE kafkaSource (
        `message` string
      ) WITH (
        'connector' = 'kafka',
        'topic' = '<yourSourceTopic>',
        'properties.bootstrap.servers' = '<yourKafkaAddress1>:<yourKafkaPort>,<yourKafkaAddress2>:<yourKafkaPort>',
        'properties.group.id' = '<yourGroupId>',
        'scan.startup.mode' = 'latest-offset',
        "format" = "csv",
        "csv.field-delimiter" = "\u0001",
        "csv.quote-character" = "''"
      );
      
      CREATE TABLE kafkaSink(
        message1 STRING,
        message2 STRING,
        message3 STRING,
        message4 STRING,
        message5 STRING,  
        message6 STRING
      ) WITH (
        'connector' = 'kafka',
        'topic' = '<yourSinkTopic>',
        'properties.bootstrap.servers' = '<yourKafkaAddress1>:<yourKafkaPort>,<yourKafkaAddress2>:<yourKafkaPort>',
        "format" = "json"
      );
      
      insert into kafkaSink select 
      JSON_VAL(message,""),
      JSON_VAL(message,"$.name"),
      JSON_VAL(message,"$.grade.science"),
      JSON_VAL(message,"$.grade.science[*]"),
      JSON_VAL(message,"$.grade.science[1]"),JSON_VAL(message,"$.grade.dddd")
      from kafkaSource;
    3. Check the output result of the Kafka topic in the sink.
      {"message1":null,"message2":"swq","message3":"[80,85]","message4":"[80,85]","message5":"85","message6":null}
      {"message1":null,"message2":null,"message3":null,"message4":null,"message5":null,"message6":null}