1.Hive内嵌函数说明
官方地址:hive函数大全官方地址
Hive字符处理函数系列1:hive内嵌字符串函数1
String Functions
Return Type
Name(Signature)
Description
regexp_extract(string subject, string pattern, int index)
Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method..
抽取字符串subject中符合正则表达式pattern的第index个部分的子字符串,注意些预定义字符的使用,如第二个参数如果使用'\s'将被匹配到s,'\\s'才是匹配空格
hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 2) ---注意2是匹配分组的
'bar.'
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)
Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT. For example, regexp_replace("foobar", "oo|ar", "") returns 'fb.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc..
按照Java正则表达式PATTERN将字符串INTIAL_STRING中符合条件的部分成REPLACEMENT所指定的字符串,如里REPLACEMENT这空的话,抽符合正则的部分将被去掉 如:regexp_replace("foobar", "oo|ar", "") = 'fb.' 注意些预定义字符的使用,如第二个参数如果使用'\s'将被匹配到s,'\\s'才是匹配空格
string
replace(string A, string OLD, string NEW)
Returns the string A with all non-overlapping occurrences of OLD replaced with NEW (as of Hive 1.3.0 and 2.1.0). Example: select replace("ababab", "abab", "Z"); returns "Zab".
split(string str, string pat)
Splits str around pat (pat is a regular expression)..
按照正则表达式pat来分割字符串str,并将分割后的数组字符串的形式返回
string
translate(string|char|varchar input, string|char|varchar from, string|char|varchar to)
Translates the input string by replacing the characters present in the from
string with the corresponding characters in the to
string. This is similar to the translate
function in PostgreSQL. If any of the parameters to this UDF are NULL, the result is NULL as well. (Available as of Hive 0.10.0, for string types)
Char/varchar support added as of Hive 0.14.0.
解释:
hive>select translate('abcdefga','abc','wo')
wodefgw 注意,结果不是wodefga,
hive>select replace('abcdefga','abc','wo')
wodefga 注意,两个结果,这就是replace和translate的区别。
map<string,string>
str_to_map(text[, delimiter1, delimiter2])
Splits text into key-value pairs using two delimiters. Delimiter1 separates text into K-V pairs, and Delimiter2 splits each K-V pair. Default delimiters are ',' for delimiter1 and ':' for delimiter2.
解释:
使用两个分隔符将文本拆分为键值对。
**分隔符1将文本分成K-V对,分隔符2分割每个K-V对。对于分隔符1默认分隔符是 ',',对于分隔符2默认分隔符是 '='**。
hive> select
str_to_map(concat(path_id,':',filter_name )) ---是map类型的
from FDM_SOR.T_FIBA_MULTI_UBA_CFG_PATH_DETAIL_D
group by path_id,filter_Name
hive>
2.split,regexp_replace,regexp_extract的使用
2.1 split函数,支持使用正则表达式对字符串进行切割,返回值为数组
SELECT
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\?') [0] AS A,
--对url进行使用?进行切割,返回值是数组,这里取?前面的值。
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\?') [1] AS A1,
SPLIT('http://facebook.com/index.html','\\?') [0] AS B ,
SPLIT('http://facebook.com/index.html','\\?') [1] AS B1,
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\\w\\/') [0] AS C1
FROM FDM_SOR.T_PLPLFIS_TB_LOAN_APPLY_CEBANK_ED
注意:所有正则表达式中的预定义字符比如?,},|等需要在这里用\\进行反转义才能表达本意。比如正则表达式中\w表示匹配字母,所以也属于预定义字符,单独的\w表示匹配的是字母w,而\\\w才表示匹配字母。
2.2. regexp_replace函数,比较简单,难的是里面参数正则表达式的书写。
select
case when regexp_replace(uniscid,'[0-9A-HJ-NPQRTUWXY]{2}\\d{6}[0-9A-HJ-NPQRTUWXY]{10}','~~fbietl~~') = '~~fbietl~~' then uniscid
else null end uniscid,
from fdm_sor.aaaaaaaaaaaaaaa;
2.3 replace和translate的区别
replace:字符串级别的代替
translate:字符级别的代替
hive>select translate('abcdefga','abc','wo') wodefgw 注意,结果不是wodefga, hive>select replace('abcdefga','abc','wo') wodefga 注意,两个结果,这就是replace和translate的区别。