String Functions and Operators
String Operators
|| Indicates the character connection.
SELECT 'he'||'llo'; –hello
String Functions
These functions assume that the input string contains valid UTF-8 encoded Unicode code points. They do not explicitly check whether UTF-8 data is valid. For invalid UTF-8 data, the function may return an incorrect result. You can use from_utf8 to correct invalid UTF-8 data.
In addition, these functions operate on Unicode code points, not on characters (or font clusters) visible to users. Some languages combine multiple code points into a single user-perceived character (which is the basic unit of the language writing system), but functions treat each code point as a separate unit.
The lower and upper functions do not perform locale-related, context-related, or one-to-many mappings required by certain languages.
- chr(n) → varchar
Description: Returns the value of a character whose Unicode encoding value is n.
select chr(100); --d
- char_length(string) → bigint
- character_length(string) → bigint
- codepoint(string) → integer
Description: Returns the Unicode encoding of a single character.
select codepoint('d'); --100
- concat(string1, string2) → varchar
Description: Concatenates strings.
select concat('hello','world'); -- helloworld
- concat_ws(string0, string1, ..., stringN) → varchar
Description: Concatenates string1, string2, ..., and stringN into a string using string0 as the separator. If string0 is null, the return value is null. If the parameter following the separator is null, the parameter will be skipped during concatenation.
select concat_ws(',','hello','world'); -- hello,world select concat_ws(NULL,'def'); --NULL select concat_ws(',','hello',NULL,'world'); -- hello,world select concat_ws(',','hello','','world'); -- hello,,world
- concat_ws(string0, array(varchar)) → varchar
Description: Concatenates elements in an array using string0 as the separator. If string0 is null, the return value is null. Any null value in the array will be skipped.
select concat_ws(NULL,ARRAY['abc']);--NULL select concat_ws(',',ARRAY['abc',NULL,NULL,'xyz']); -- abc,xyz select concat_ws(',',ARRAY['hello','world']); -- hello,world
- decode(binary bin, string charset) →varchar
Description: Encodes the first parameter into a string based on the specified character set. The supported character sets include UTF-8, UTF-16BE, UTF-16LE, and UTF-16. If the first parameter is null, null is returned.
select decode(X'70 61 6e 64 61','UTF-8'); _col0 ------- panda (1 row) select decode(X'00 70 00 61 00 6e 00 64 00 61','UTF-16BE'); _col0 ------- panda (1 row)
- encode(string str, string charset) →binary
Description: Encodes a string based on the specified character set.
select encode('panda','UTF-8'); _col0 ---------------- 70 61 6e 64 61 (1 row)
- find_in_set (string str, string strList) →int
Description: Returns the position of the first occurrence of the string in the comma-separated strList. If a parameter is null, null is returned.
select find_in_set('ab', 'abc,b,ab,c,def'); -- 3
- format_number(number x, int d) →string
Description: Formats the number x to #,###,###.##, reserves d decimal places, and returns the result as a string.
select format_number(541211.212,2); -- 541,211.21
- format(format,args...) → varchar
Description: For details, see Format.
- locate(string substr, string str, int pos]) →int
Description: Returns the position of the first occurrence of the substring after the pos position in the string. If the condition is not met, 0 is returned.
select locate('aaa','bbaaaaa',6);-- 0 select locate('aaa','bbaaaaa',1);-- 3 select locate('aaa','bbaaaaa',4);-- 4
- length(string) → bigint
Description: Returns the length of the string.
select length('hello');-- 5
- levenshtein_distance(string1, string2) → bigint
Description: Calculates the Levenshtein distance between string1 and string2, that is, the minimum number of single-character edits (insertions, deletions, or substitutions) required to convert string1 to string2.
select levenshtein_distance('helo word','hello,world'); -- 3
- hamming_distance(string1, string2) → bigint
Description: Returns the Hamming distance between character strings 1 and 2, that is, the number of different characters in the corresponding positions. Note that the lengths of the two strings must be the same.
select hamming_distance('abcde','edcba');-- 4
- instr(string,substring) → bigint
- levenshtein(string1, string2) → bigint
For details, see levenshtein_distance(string1, string2).
- levenshtein_distance(string1, string2) → bigint
Description: Returns the Levenshtein edit distance between string 1 and string 2, that is, the minimum number of single-character edits (insertion, deletion, or replacement) required to change string 1 to string 2.
select levenshtein_distance('apple','epplea');-- 2
- lower(string) → varchar
Description: Converts characters into lowercase letters.
select lower('HELLo!');-- hello!
- lcase(string A) → varchar
- ltrim(string) → varchar
Description: Removes spaces at the beginning of a character string.
select ltrim(' hello');-- hello
- lpad(string, size, padstring) → varchar
Description: Pads the string to the right to resize it using padstring. If size is less than the length of the string, the result is truncated to size characters. The size cannot be negative, and the padding string must not be empty.
select lpad('myk',5,'dog'); -- domyk
- luhn_check(string) → boolean
Description: Tests whether a numeric string is valid based on the Luhn algorithm.
This checksum function, also known as mod 10, is widely used to validate a variety of identification numbers, such as credit card numbers and ID card numbers.
select luhn_check('79927398713'); -- true select luhn_check('79927398714'); -- false
- octet_length(string str) → int
Description: Returns the number of bytes for saving the string encoded using UTF-8.
select octet_length('query');--5
- parse_url(string urlString, string partToExtract [, string keyToExtract]) →string
Description: Returns the specified part of a URL. The valid value of the partToExtract parameter is HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. keyToExtract is an optional parameter, which is used to select the value corresponding to the key in QUERY.
select parse_url('https://www.example.com/index.html','HOST'); _col0 ---------- www.example.com (1 row) -- Query the value of service in QUERY of the URL. select parse_url('https://www.example.com/query/index.html?name=panda','QUERY','name'); _col0 ------- panda (1 row)
- position(substring IN string) →bigint
Description: Returns the position of the first occurrence of a substring in the parent string.
select position('ab' in 'sssababa');-- 4
- quote(String text) → string
Description: Returns a string enclosed in single quotation marks. Strings containing single quotation marks are not supported.
select quote('DONT');-- 'DONT' select quote(NULL);-- NULL
- repeat2(string str, int n) → string
Description: Returns a string obtained by repeating the str string for n times.
select repeat2('abc',4); _col0 -------------- abcabcabcabc (1 row)
- replace(string, 'a')→ varchar
Description: Removes the character a from the character string.
select replace('hello','e');-- hllo
- replace(string, 'a', 'b')→ varchar
Description: Replaces all a characters in a string with b.
select replace('hello','l','m');-- hemmo
- reverse(string) → varchar
Description: reverses the string.
select reverse('hello');-- olleh
- rpad(string, size, padstring) → varchar
Description: Pads the string to the right to resize it using padstring. If size is less than the length of the string, the result is truncated to size characters. The size cannot be negative, and the padding string must not be empty.
select rpad('myk',5,'dog'); -- mykdo
- rtrim(string) → varchar
Description: Removes spaces at the end of a character string.
select rtrim('hello world! ');-- hello world!
- space(int n) → varchar
Description: Returns n spaces.
select space(4); _col0 ------- (1 row) select length(space(4)); _col0 ------- 4 (1 row)
- split(string, delimiter) → array
Description: Splits the string by delimiters into an array.
select split('a:b:c:d',':');-- [a, b, c, d]
- split(string, delimiter, limit) → array
Description: Splits a string into an array by delimiter. limit indicates the number of elements. The last element contains all the characters of the last string. limit must be a number.
select split('a:b:c:d',':',2);-- [a, b:c:d] select split('a:b:c:d',':',4);-- [a, b, c, d]
- split_part(string, delimiter, index) → varchar
Description: Splits a string into an array by delimiter and extracts the elements whose index value is index. The index starts from 1. If the index exceeds the array length, NULL is returned.
select split_part('a:b:c:d',':',2); -- b select split_part('a:b:c:d',':',5); -- NULL
- split_to_map (string, entryDelimiter, keyValueDelimiter) → map<varchar, varchar>
Description: Splits a string into mapped key-value pairs by entryDelimiter, and each key-value pair differentiates keys and values by keyValueDelimiter.
select split_to_map('li:18,wang:17',',',':');--{wang=17, li=18}
- split_to_multimap(string, entryDelimiter, keyValueDelimiter) -> map(varchar, array(varchar)
Description: Splits a string by entryDelimiter and keyValueDelimiter and returns a map. Each key corresponds to a value of the array type. entryDelimiter splits a string into key-value pairs, and keyValueDelimiter splits a key-value pair into a key and a value.
select split_to_multimap('li:18,wang:17,li:19,wang:18',',',':');--{wang=[17, 18], li=[18, 19]}
- strpos(string, substring) → bigint
Description: Returns the position of the first occurrence of substring in a string. The value starts from 1. If the value is not found, the value 0 is returned. Example:
select strpos('hello world!','l'); --3 select strpos('hello world!','da'); --0
- str_to_map() For details, see split_to_map().
- substr(string, start) → varchar
Description: Truncates a character string from the start position.
select substr('hello world',3);-- llo world
- substr(string, start, length) → varchar
Description: Truncates a character string from the start position. The truncated length is length.
Generally, it is used to truncate the timestamp format.
Select substr('2019-03-10 10:00:00',1,10); --Truncate to March 10, 2019. Select substr('2019-03-10 10:00:00',1,7); --Truncate to March 2019.
- substring(string, start) → varchar
- substring_index(string A, string delim, int count) → varchar
Description: If count is a positive number, all content before the count delimiter from the left is returned. If count is a negative number, all content after the count delimiter from the right is returned.
select substring_index('one.two.three','.',2); _col0 ----------------- one.two (1 row) select substring_index('one.two.three','.',-2); _col0 ----------------- two.three (1 row) select substring_index('one.two.three','.',0); _col0 ------- NULL (1 row)
- soundex(string A) →varchar
Description: Returns code (soundex) consisting of four characters to evaluate the similarity of two strings in pronunciation. The rules are as follows:
Table 1 Character mapping rule Character
Digit
a, e, h, i, o, u, w, and y
0
b. f, p, and v
1
c. g, j, k, q, s, x, and z
2
d and t
3
l
4
m and n
5
r
6
- Extracts the first letter of a string as the first value of soundex.
- Replaces the latter letters with digits one by one based on the preceding letter mapping rules. If there are consecutive equal numbers, retain only one number and delete.
- If the result contains more than four digits, the first four digits are used. If the result contains less than four digits, pad 0s to the end.
select soundex('Miller'); _col0 ------- M460 (1 row)
- translate(string|char|varchar input, string|char|varchar from, string|char|varchar to) →varchar
Description: Replaces the string specified by the from parameter with the string specified by the to parameter for an input string. If one of the three parameters is null, NULL is returned.
select translate('aabbcc','bb','BB'); _col0 -------- aaBBcc (1 row)
- trim(string) → varchar
Description: Removes spaces at the beginning and end of a character string.
select trim(' hello world! ');-- hello world!
- btrim(String str1,String str2) → varchar
Description: Removes all characters contained in str2 from the beginning and end of str1.
select btrim('hello','hlo');-- e
- upper(string) → varchar
Description: Converts character strings to uppercase letters.
select upper('heLLo');-- HELLO
- ucase(string A) → varchar
- base64decode(STRING str)
Description: Performs Base64 reverse encoding on the character string.
SELECT to_base64(CAST('hello world' as varbinary));-- aGVsbG8gd29ybGQ= select base64decode('aGVsbG8gd29ybGQ=');-- hello world
- jaro_distance(STRING str1, STRING str2)
Description: Compares the similarity between two character strings.
select JARO_DISTANCE('hello', 'hell');-- 0.9333333333333332
- FNV_HASH(type v)
Description: Calculates the hash value of a character string.
select FNV_HASH('hello');-- -6615550055289275125
- word_stem(word) → varchar
Description: Returns the stem of an English word.
select word_stem('greating');-- great
- word_stem(word, lang) → varchar
Description: Returns the stem of a word in a specified language.
select word_stem('ultramoderne','fr');-- ultramodern
- translate(source, from, to) → varchar
Description: Returns the translated source string by replacing the characters found in the source string with the corresponding characters in the target string. If the from string contains duplicate items, only the first one is used. If the source character does not exist in the from string, the source character is copied without translation. If the index of the matching character in the from string exceeds the length of the to string, the source character is omitted from the result string.
SELECT translate('abcd', '', ''); -- 'abcd' SELECT translate('abcd', 'a', 'z'); -- 'zbcd' SELECT translate('abcda', 'a', 'z'); -- 'zbcdz' SELECT translate('Palhoça', 'ç','c'); -- 'Palhoca' SELECT translate('abcd', 'a', ''); -- 'bcd' SELECT translate('abcd', 'a', 'zy'); -- 'zbcd' SELECT translate('abcd', 'ac', 'z'); -- 'zbd' SELECT translate('abcd', 'aac', 'zq'); -- 'zbd'
- normalize(string) → varchar
Description: Returns a standard string in NFC format.
select normalize('e'); _col0 ------- e (1 row)
- normalize(string, form) → varchar
Description: Unicode allows you to write the same character in different bytes. For example, é consists of 0xC3 and 0xA9, and é consists of 0x65, 0xCC, and 0x81.
normalize() returns a standard string based on the Unicode standard formats (including NFC, NFD, NFKC, and NFKD) specified by the parameter format. If no parameter format is specified, NFC is used by default.
select to_utf8('é'); _col0 ------- c3 a9 (1 row) select to_utf8('é'); _col0 ---------- 65 cc 81 (1 row) select normalize('é',NFC)=normalize('é',NFC); _col0 ------- true (1 row)
- to_utf8(string) → varbinary
Description: Encodes a string into a UTF-8 string.
select to_utf8('panda'); _col0 ---------------- 70 61 6e 64 61 (1 row)
- from_utf8(binary) → varchar
Description: Encodes a binary string into a UTF-8 string. An invalid UTF-8 sequence will be replaced by the Unicode character U+FFFD.
select from_utf8(X'70 61 6e 64 61'); _col0 ------- panda (1 row)
- from_utf8(binary, replace) → varchar
Description: Encodes a binary string into a UTF-8 string. An invalid UTF-8 sequence will be replaced by the replace parameter. The value of the replace parameter must be a single character or empty to prevent invalid characters from being removed.
select from_utf8(X'70 61 6e 64 61 b1','!'); _col0 -------- panda! (1 row)
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot