Updated on 2024-12-31 GMT+08:00

String Functions and Operators

String Operators

|| Indicates the character connection.

SELECT 'he'||'llo'; --hello

String Functions

These functions assume that the input string contains valid UTF-8 encoded Unicode code points. They do not explicitly check whether UTF-8 data is valid. For invalid UTF-8 data, the function may return an incorrect result. You can use from_utf8 to correct invalid UTF-8 data.

In addition, these functions operate on Unicode code points, not on characters (or font clusters) visible to users. Some languages combine multiple code points into a single user-perceived character (which is the basic unit of the language writing system), but functions treat each code point as a separate unit.

The lower and upper functions do not perform locale-related, context-related, or one-to-many mappings required by certain languages.

  • chr(n) → varchar

    Description: Returns the value of a character whose Unicode encoding value is n.

    select chr(100); --d
  • char_length(string) → bigint

    For details, see length(string).

  • character_length(string) → bigint

    For details, see length(string).

  • codepoint(string) → integer

    Description: Returns the Unicode encoding of a single character.

    select codepoint('d'); --100
  • concat(string1, string2) → varchar

    Description: Concatenates strings.

    select concat('hello','world'); -- helloworld
  • concat_ws(string0, string1, ..., stringN) → varchar

    Description: Concatenates string1, string2, ..., and stringN into a string using string0 as the separator. If string0 is null, the return value is null. If the parameter following the separator is null, the parameter will be skipped during concatenation.

    select concat_ws(',','hello','world'); -- hello,world
    select concat_ws(NULL,'def'); --NULL
    select concat_ws(',','hello',NULL,'world'); -- hello,world
    select concat_ws(',','hello','','world'); -- hello,,world
  • concat_ws(string0, array(varchar)) → varchar

    Description: Concatenates elements in an array using string0 as the separator. If string0 is null, the return value is null. Any null value in the array will be skipped.

    select concat_ws(NULL,ARRAY['abc']);--NULL
    select concat_ws(',',ARRAY['abc',NULL,NULL,'xyz']); -- abc,xyz
    select concat_ws(',',ARRAY['hello','world']); -- hello,world
  • decode(binary bin, string charset) →varchar

    Description: Encodes the first parameter into a string based on the specified character set. The supported character sets include UTF-8, UTF-16BE, UTF-16LE, and UTF-16. If the first parameter is null, null is returned.

    select decode(X'70 61 6e 64 61','UTF-8');
     _col0 
    -------
     panda 
    (1 row)
     
    select decode(X'00 70 00 61 00 6e 00 64 00 61','UTF-16BE');
     _col0 
    -------
     panda 
    (1 row)
  • encode(string str, string charset) →binary

    Description: Encodes a string based on the specified character set.

    select encode('panda','UTF-8');
         _col0      
    ----------------
     70 61 6e 64 61 
    (1 row)
  • find_in_set (string str, string strList) →int

    Description: Returns the position of the first occurrence of the string in the comma-separated strList. If a parameter is null, null is returned.

    select find_in_set('ab', 'abc,b,ab,c,def'); -- 3
  • format_number(number x, int d) →string

    Description: Formats the number x to #,###,###.##, reserves d decimal places, and returns the result as a string.

    select format_number(541211.212,2); -- 541,211.21
  • format(format,args...) → varchar

    Description: For details, see Format.

  • locate(string substr, string str, int pos]) →int

    Description: Returns the position of the first occurrence of the substring after the pos position in the string. If the condition is not met, 0 is returned.

    select locate('aaa','bbaaaaa',6);-- 0
    select locate('aaa','bbaaaaa',1);-- 3
    select locate('aaa','bbaaaaa',4);-- 4
  • length(string) → bigint

    Description: Returns the length of the string.

    select length('hello');-- 5
  • levenshtein_distance(string1, string2) → bigint

    Description: Calculates the Levenshtein distance between string1 and string2, that is, the minimum number of single-character edits (insertions, deletions, or substitutions) required to convert string1 to string2.

    select levenshtein_distance('helo word','hello,world'); -- 3
  • hamming_distance(string1, string2) → bigint

    Description: Returns the Hamming distance between character strings 1 and 2, that is, the number of different characters in the corresponding positions. Note that the lengths of the two strings must be the same.

    select hamming_distance('abcde','edcba');-- 4
  • instr(string,substring) → bigint
    Description: Locates the first occurrence of a substring in a string.
    select instr('abcde', 'cd');--3
  • levenshtein(string1, string2) → bigint

    For details, see levenshtein_distance(string1, string2).

  • levenshtein_distance(string1, string2) → bigint

    Description: Returns the Levenshtein edit distance between string 1 and string 2, that is, the minimum number of single-character edits (insertion, deletion, or replacement) required to change string 1 to string 2.

    select levenshtein_distance('apple','epplea');-- 2
  • lower(string) → varchar

    Description: Converts characters into lowercase letters.

    select lower('HELLo!');-- hello!
  • lcase(string A) → varchar

    Description: Same as lower(string).

  • ltrim(string) → varchar

    Description: Removes spaces at the beginning of a character string.

    select ltrim('   hello');-- hello
  • lpad(string, size, padstring) → varchar

    Description: Pads the string to the right to resize it using padstring. If size is less than the length of the string, the result is truncated to size characters. The size cannot be negative, and the padding string must not be empty.

    select lpad('myk',5,'fish'); -- domyk
  • luhn_check(string) → boolean

    Description: Tests whether a numeric string is valid based on the Luhn algorithm.

    This checksum function, also known as mod 10, is widely used to validate a variety of identification numbers, such as credit card numbers and ID card numbers.

    select luhn_check('79927398713'); -- true
    select luhn_check('79927398714'); -- false
  • octet_length(string str) → int

    Description: Returns the number of bytes for saving the string encoded using UTF-8.

    select octet_length('query');--5
  • parse_url(string urlString, string partToExtract [, string keyToExtract]) →string

    Description: Returns the specified part of a URL. The valid value of the partToExtract parameter is HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. keyToExtract is an optional parameter, which is used to select the value corresponding to the key in QUERY.

    select parse_url('https://www.example.com/index.html','HOST');
      _col0   
    ----------
     www.example.com
    (1 row)
     
    -- Query the value of service in QUERY of the URL.
    select parse_url('https://www.example.com/query/index.html?name=panda','QUERY','name');
     _col0 
    -------
     panda 
    (1 row)
  • position(substring IN string) →bigint

    Description: Returns the position of the first occurrence of a substring in the parent string.

    select position('ab' in 'sssababa');-- 4
  • quote(String text) → string

    Description: Returns a string enclosed in single quotation marks. Strings containing single quotation marks are not supported.

    select quote('DONT');-- 'DONT'
    select quote(NULL);-- NULL
  • repeat2(string str, int n) → string

    Description: Returns a string obtained by repeating the str string for n times.

    select repeat2('abc',4);
        _col0     
    --------------
     abcabcabcabc 
    (1 row)
  • replace(string, 'a')→ varchar

    Description: Removes the character a from the character string.

    select replace('hello','e');-- hllo
  • replace(string, 'a', 'b')→ varchar

    Description: Replaces all a characters in a string with b.

    select replace('hello','l','m');-- hemmo
  • reverse(string) → varchar

    Description: reverses the string.

    select reverse('hello');-- olleh
  • rpad(string, size, padstring) → varchar

    Description: Pads the string to the right to resize it using padstring. If size is less than the length of the string, the result is truncated to size characters. The size cannot be negative, and the padding string must not be empty.

    select rpad('myk',5,'fish'); -- mykdo
  • rtrim(string) → varchar

    Description: Removes spaces at the end of a character string.

    select rtrim('hello world!   ');-- hello world!
  • space(int n) → varchar

    Description: Returns n spaces.

    select space(4);
     _col0 
    -------
           
    (1 row)
     
    select length(space(4));
     _col0 
    -------
         4 
    (1 row)
  • split(string, delimiter) → array

    Description: Splits the string by delimiters into an array.

    select split('a:b:c:d',':');-- [a, b, c, d]
  • split(string, delimiter, limit) → array

    Description: Splits a string into an array by delimiter. limit indicates the number of elements. The last element contains all the characters of the last string. limit must be a number.

    select split('a:b:c:d',':',2);-- [a, b:c:d]
    select split('a:b:c:d',':',4);-- [a, b, c, d]
  • split_part(string, delimiter, index) → varchar

    Description: Splits a string into an array by delimiter and extracts the elements whose index value is index. The index starts from 1. If the index exceeds the array length, NULL is returned.

    select split_part('a:b:c:d',':',2); -- b
    select split_part('a:b:c:d',':',5); -- NULL
  • split_to_map (string, entryDelimiter, keyValueDelimiter) → map<varchar, varchar>

    Description: Splits a string into mapped key-value pairs by entryDelimiter, and each key-value pair differentiates keys and values by keyValueDelimiter.

    select split_to_map('li:18,wang:17',',',':');--{wang=17, li=18}
  • split_to_multimap(string, entryDelimiter, keyValueDelimiter) -> map(varchar, array(varchar)

    Description: Splits a string by entryDelimiter and keyValueDelimiter and returns a map. Each key corresponds to a value of the array type. entryDelimiter splits a string into key-value pairs, and keyValueDelimiter splits a key-value pair into a key and a value.

    select split_to_multimap('li:18,wang:17,li:19,wang:18',',',':');--{wang=[17, 18], li=[18, 19]}
  • strpos(string, substring) → bigint

    Description: Returns the position of the first occurrence of substring in a string. The value starts from 1. If the value is not found, the value 0 is returned. Example:

    select strpos('hello world!','l'); --3
    select strpos('hello world!','da'); --0
  • str_to_map() For details, see split_to_map().
  • substr(string, start) → varchar

    Description: Truncates a character string from the start position.

    select substr('hello world',3);-- llo world
  • substr(string, start, length) → varchar

    Description: Truncates a character string from the start position. The truncated length is length.

    Generally, it is used to truncate the timestamp format.

    Select substr('2019-03-10 10:00:00',1,10); --Truncate to March 10, 2019.
    Select substr('2019-03-10 10:00:00',1,7); --Truncate to March 2019.
  • substring(string, start) → varchar

    For details, see substr(string, start).

  • substring_index(string A, string delim, int count) → varchar

    Description: If count is a positive number, all content before the count delimiter from the left is returned. If count is a negative number, all content after the count delimiter from the right is returned.

    select substring_index('one.two.three','.',2);
          _col0      
    -----------------
     one.two 
    (1 row)
     
    select substring_index('one.two.three','.',-2);
          _col0      
    -----------------
     two.three 
    (1 row)
     
    select substring_index('one.two.three','.',0);
     _col0 
    -------
     NULL  
    (1 row)
  • soundex(string A) →varchar

    Description: Returns code (soundex) consisting of four characters to evaluate the similarity of two strings in pronunciation. The rules are as follows:

    Table 1 Character mapping rule

    Character

    Digit

    a, e, h, i, o, u, w, and y

    0

    b. f, p, and v

    1

    c. g, j, k, q, s, x, and z

    2

    d and t

    3

    l

    4

    m and n

    5

    r

    6

    • Extracts the first letter of a string as the first value of soundex.
    • Replaces the latter letters with digits one by one based on the preceding letter mapping rules. If there are consecutive equal numbers, retain only one number and delete.
    • If the result contains more than four digits, the first four digits are used. If the result contains less than four digits, pad 0s to the end.
      select soundex('Miller');
       _col0 
      -------
       M460  
      (1 row)
  • translate(string|char|varchar input, string|char|varchar from, string|char|varchar to) →varchar

    Description: Replaces the string specified by the from parameter with the string specified by the to parameter for an input string. If one of the three parameters is null, NULL is returned.

    select translate('aabbcc','bb','BB');
     _col0  
    --------
     aaBBcc 
    (1 row)
  • trim(string) → varchar

    Description: Removes spaces at the beginning and end of a character string.

    select trim('  hello world!  ');-- hello world!
  • btrim(String str1,String str2) → varchar

    Description: Removes all characters contained in str2 from the beginning and end of str1.

    select btrim('hello','hlo');-- e
  • upper(string) → varchar

    Description: Converts character strings to uppercase letters.

    select upper('heLLo');-- HELLO
  • ucase(string A) → varchar

    Description: Same as upper(string).

  • base64decode(STRING str)

    Description: Performs Base64 reverse encoding on the character string.

    SELECT to_base64(CAST('hello world' as varbinary));-- aGVsbG8gd29ybGQ=
    select base64decode('aGVsbG8gd29ybGQ=');-- hello world
  • jaro_distance(STRING str1, STRING str2)

    Description: Compares the similarity between two character strings.

    select JARO_DISTANCE('hello', 'hell');-- 0.9333333333333332 
  • FNV_HASH(type v)

    Description: Calculates the hash value of a character string.

    select FNV_HASH('hello');-- -6615550055289275125 
  • word_stem(word) → varchar

    Description: Returns the stem of an English word.

    select word_stem('greating');-- great
  • word_stem(word, lang) → varchar

    Description: Returns the stem of a word in a specified language.

    select word_stem('ultramoderne','fr');-- ultramodern
  • translate(source, from, to) → varchar

    Description: Returns the translated source string by replacing the characters found in the source string with the corresponding characters in the target string. If the from string contains duplicate items, only the first one is used. If the source character does not exist in the from string, the source character is copied without translation. If the index of the matching character in the from string exceeds the length of the to string, the source character is omitted from the result string.

    SELECT translate('abcd', '', ''); -- 'abcd'
    SELECT translate('abcd', 'a', 'z'); -- 'zbcd'
    SELECT translate('abcda', 'a', 'z'); -- 'zbcdz'
    SELECT translate('Palhoça', 'ç','c'); -- 'Palhoca'
    SELECT translate('abcd', 'a', ''); -- 'bcd'
    SELECT translate('abcd', 'a', 'zy'); -- 'zbcd'
    SELECT translate('abcd', 'ac', 'z'); -- 'zbd'
    SELECT translate('abcd', 'aac', 'zq'); -- 'zbd'
Unicode functions
  • normalize(string) → varchar

    Description: Returns a standard string in NFC format.

    select normalize('e');
     _col0 
    -------
     e     
    (1 row)
  • normalize(string, form) → varchar

    Description: Unicode allows you to write the same character in different bytes. For example, é consists of 0xC3 and 0xA9, and consists of 0x65, 0xCC, and 0x81.

    normalize() returns a standard string based on the Unicode standard formats (including NFC, NFD, NFKC, and NFKD) specified by the parameter format. If no parameter format is specified, NFC is used by default.

    select to_utf8('é');
     _col0 
    -------
     c3 a9 
    (1 row)
     
    select to_utf8('é');
      _col0   
    ----------
     65 cc 81 
    (1 row)
     
    select normalize('é',NFC)=normalize('é',NFC);
     _col0 
    -------
     true  
    (1 row)
  • to_utf8(string) → varbinary

    Description: Encodes a string into a UTF-8 string.

    select to_utf8('panda');
         _col0      
    ----------------
     70 61 6e 64 61 
    (1 row)
  • from_utf8(binary) → varchar

    Description: Encodes a binary string into a UTF-8 string. An invalid UTF-8 sequence will be replaced by the Unicode character U+FFFD.

    select from_utf8(X'70 61 6e 64 61');
     _col0 
    -------
     panda 
    (1 row)
  • from_utf8(binary, replace) → varchar

    Description: Encodes a binary string into a UTF-8 string. An invalid UTF-8 sequence will be replaced by the replace parameter. The value of the replace parameter must be a single character or empty to prevent invalid characters from being removed.

    select from_utf8(X'70 61 6e 64 61 b1','!');
     _col0  
    --------
     panda! 
    (1 row)