Overview

Table 1 lists the string functions supported by DLI.

**Table 1** String functions
Function	Syntax	Value Type	Description
ascii	ascii(string <str>)	BIGINT	Returns the numeric value of the first character in a string.
concat	concat(array<T> <a>, array<T> <b>[,...]), concat(string <str1>, string <str2>[,...])	ARRAY or STRING	Returns a string concatenated from multiple input strings. This function can take any number of input strings.
concat_ws	concat_ws(string <separator>, string <str1>, string <str2>[,...]), concat_ws(string <separator>, array<string> <a>)	ARRAY or STRUCT	Returns a string concatenated from multiple input strings that are separated by specified separators.
char_matchcount	char_matchcount(string <str1>, string <str2>)	BIGINT	Returns the number of characters in str1 that appear in str2.
encode	encode(string <str>, string <charset>)	BINARY	Returns strs encoded in charset format.
find_in_set	find_in_set(string <str1>, string <str2>)	BIGINT	Returns the position (stating from 1) of str1 in str2 separated by commas (,).
get_json_object	get_json_object(string <json>, string <path>)	STRING	Parses the JSON object in a specified JSON path. The function will return NULL if the JSON object is invalid.
instr	instr(string <str>, string <substr>)	INT	Returns the index of substr that appears earliest in str. It returns NULL if either of the arguments are NULL and returns 0 if substr does not exist in str. Note that the first character in str has index 1.
instr1	instr1(string <str1>, string <str2>[, bigint <start_position>[, bigint <nth_appearance>]])	BIGINT	Returns the position of str2 in str1.
initcap	initcap(string A)	STRING	Converts the first letter of each word of a string to upper case and all other letters to lower case.
keyvalue	keyvalue(string <str>,[string <split1>,string <split2>,] string <key>)	STRING	Splits str by split1, converts each group into a key-value pair by split2, and returns the value corresponding to the key.
length	length(string <str>)	BIGINT	Returns the length of a string.
lengthb	lengthb(string <str>)	STRING	Returns the length of a specified string in bytes.
levenshtein	levenshtein(string A, string B)	INT	Returns the Levenshtein distance between two strings, for example, levenshtein('kitten','sitting') = 3.
locate	locate(string <substr>, string <str>[, bigint <start_pos>])	BIGINT	Returns the position of substr in str.
lower/lcase	lower(string A) , lcase(string A)	STRING	Converts all characters of a string to the lower case.
lpad	lpad(string <str1>, int <length>, string <str2>)	STRING	Returns a string of a specified length. If the length of the given string (str1) is shorter than the specified length (length), the given string is left-padded with str2 to the specified length.
ltrim	ltrim([<trimChars>,] string <str>)	STRING	Trims spaces from the left hand side of a string.
parse_url	parse_url(string urlString, string partToExtract [, string keyToExtract])	STRING	Returns the specified part of a given URL. Valid values of partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. For example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') returns 'facebook.com'. When the second parameter is set to QUERY, the third parameter can be used to extract the value of a specific parameter. For example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') returns 'v1'.
printf	printf(String format, Obj... args)	STRING	Prints the input in a specific format.
regexp_count	regexp_count(string <source>, string <pattern>[, bigint <start_position>])	BIGINT	Returns the number of substrings that match a specified pattern in the source, starting from the start_position position.
regexp_extract	regexp_extract(string <source>, string <pattern>[, bigint <groupid>])	STRING	Matches the string source based on the pattern grouping rule and returns the string content that matches groupid.
replace	replace(string <str>, string <old>, string <new>)	STRING	Replaces the substring that matches a specified string in a string with another string.
regexp_replace	For Spark 2.4.5: regexp_replace(string <source>, string <pattern>, string <replace_string>) For Spark 3.3.1: regexp_replace(string <source>, string <pattern>, string <replace_string>[, bigint <occurrence>])	STRING	For Spark 2.4.5: Replaces the substring that matches the pattern for the occurrence time in the source string and the substring that matches the pattern later with the specified string replace_string and returns the result string. For Spark 3.3.1: Replaces the substring that matches the pattern for the occurrence time in the source string and the substring that matches the pattern later with the specified string replace_string and returns the result string.
regexp_replace1	regexp_replace1(string <source>, string <pattern>, string <replace_string>[, bigint <occurrence>])	STRING	Replaces the substring that matches pattern for the occurrence time in the source string with the specified string replace_string and returns the result string.
regexp_instr	regexp_instr(string <source>, string <pattern>[,bigint <start_position>[, bigint <occurrence>[, bigint <return_option>]]])	BIGINT	Returns the start or end position of the substring that matches a specified pattern for the occurrence time, starting from start_position in the source string.
regexp_substr	regexp_substr(string <source>, string <pattern>[, bigint <start_position>[, bigint <occurrence>]])	STRING	Returns the substring that matches a specified pattern for the occurrence time, starting from start_position in the source string.
repeat	repeat(string <str>, bigint <n>)	STRING	Repeats a string for N times.
reverse	reverse(string <str>)	STRING	Returns a string in reverse order.
rpad	rpad(string <str1>, int <length>, string <str2>)	STRING	Right-pads str1 with str2 to the specified length.
rtrim	rtrim([<trimChars>, ]string <str>), rtrim(trailing [<trimChars>] from <str>)	STRING	Trims spaces from the right hand side of a string.
soundex	soundex(string <str>)	STRING	Returns the soundex string from str, for example, soundex('Miller') = M460.
space	space(bigint <n>)	STRING	Returns a specified number of spaces.
substr/substring	substr(string <str>, bigint <start_position>[, bigint <length>]), substring(string <str>, bigint <start_position>[, bigint <length>])	STRING	Returns the substring of str, starting from start_position and with a length of length.
substring_index	substring_index(string <str>, string <separator>, int <count>)	STRING	Truncates the string before the count separator of str. If the value of count is positive, the string is truncated from the left. If the value of count is negative, the string is truncated from the right.
split_part	split_part(string <str>, string <separator>, bigint <start>[, bigint <end>])	STRING	Splits a specified string based on a specified separator and returns a substring from the start to end position.
translate	translate(string\|char\|varchar input, string\|char\|varchar from, string\|char\|varchar to)	STRING	Translates the input string by replacing the characters or string specified by from with the characters or string specified by to. For example, replaces bcd in abcde with BCD using translate("abcde", "bcd", "BCD").
trim	trim([<trimChars>,]string <str>), trim([BOTH] [<trimChars>] from <str>)	STRING	Trims spaces from both ends of a string.
upper/ucase	upper(string A), ucase(string A)	STRING	Converts all characters of a string to the upper case.

Parent topic: String Functions

Previous topic: String Functions

Next topic: ascii

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.