HBase Filter及对应Shell

Stella981
• 阅读 1005

http://www.cnblogs.com/skyl/p/4807793.html

比较运算符 CompareFilter.CompareOp
比较运算符用于定义比较关系,可以有以下几类值供选择:

  • EQUAL 相等
  • GREATER 大于
  • GREATER_OR_EQUAL 大于等于
  • LESS 小于
  • LESS_OR_EQUAL 小于等于
  • NOT_EQUAL 不等于

比较器 ByteArrayComparable
通过比较器可以实现多样化目标匹配效果,比较器有以下子类可以使用:

  • BinaryComparator 匹配完整字节数组
  • BinaryPrefixComparator 匹配字节数组前缀
  • BitComparator  不常用
  • NullComparator  不常用
  • RegexStringComparator 匹配正则表达式
  • SubstringComparator 匹配子字符串

1.多重过滤器--FilterList(Shell不支持)
FilterList代表一个过滤器链,它可以包含一组即将应用于目标数据集的过滤器,过滤器间具有“与”FilterList.Operator.MUST_PASS_ALL 和“或” FilterList.Operator.MUST_PASS_ONE 关系。

HBase Filter及对应Shell

//结合过滤器,获取所有age在15到30之间的行
private static void scanFilter() throws IOException,
        UnsupportedEncodingException {
    Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
    conf.set("hbase.zookeeper.quorum", "ncst");
    HTable ht = new HTable(conf, "users");
    
    // And
    FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
    // >=15
    SingleColumnValueFilter filter1 = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.GREATER_OR_EQUAL, "15".getBytes());
    // =<30
    SingleColumnValueFilter filter2 = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.LESS_OR_EQUAL, "30".getBytes());
    filterList.addFilter(filter1);
    filterList.addFilter(filter2);        
    
    Scan scan = new Scan();
    // set Filter
    scan.setFilter(filterList);
    
    ResultScanner rs = ht.getScanner(scan);
    for(Result result : rs){
        for(Cell cell : result.rawCells()){
            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
                    +new String(CellUtil.cloneFamily(cell))+"\t"
                    +new String(CellUtil.cloneQualifier(cell))+"\t"
                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
                    +cell.getTimestamp());
        }
    }
    ht.close();
}

HBase Filter及对应Shell

2. 列值过滤器--SingleColumnValueFilter
用于测试列值相等(CompareOp.EQUAL ),不等(CompareOp.NOT_EQUAL),或单侧范围 (如CompareOp.GREATER)。构造函数:
2.1.比较的关键字是一个字符数组(Shell不支持?)
SingleColumnValueFilter(byte[] family, byte[] qualifier, CompareFilter.CompareOp compareOp, byte[] value)

HBase Filter及对应Shell

//SingleColumnValueFilter例子
private static void scanFilter01() throws IOException,
        UnsupportedEncodingException {
    Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
    conf.set("hbase.zookeeper.quorum", "ncst");
    HTable ht = new HTable(conf, "users");
    
    SingleColumnValueFilter scvf = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.EQUAL, "18".getBytes());
    Scan scan = new Scan();
    scan.setFilter(scvf);
    ResultScanner rs = ht.getScanner(scan);
    for(Result result : rs){
        for(Cell cell : result.rawCells()){
            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
                    +new String(CellUtil.cloneFamily(cell))+"\t"
                    +new String(CellUtil.cloneQualifier(cell))+"\t"
                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
                    +cell.getTimestamp());
        }
    }
    ht.close();
}

HBase Filter及对应Shell

2.2.比较的关键字是一个比较器ByteArrayComparable
SingleColumnValueFilter(byte[] family, byte[] qualifier, CompareFilter.CompareOp compareOp, ByteArrayComparable comparator)

HBase Filter及对应Shell

//SingleColumnValueFilter例子2 -- RegexStringComparator
private static void scanFilter02() throws IOException,
        UnsupportedEncodingException {
    Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
    conf.set("hbase.zookeeper.quorum", "ncst");
    HTable ht = new HTable(conf, "users");
       //值比较的正则表达式 -- RegexStringComparator
    //匹配info:age值以"4"结尾
    RegexStringComparator comparator = new RegexStringComparator(".4");
    //第四个参数不一样
    SingleColumnValueFilter scvf = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.EQUAL, comparator);
    Scan scan = new Scan();
    scan.setFilter(scvf);
    ResultScanner rs = ht.getScanner(scan);
    for(Result result : rs){
        for(Cell cell : result.rawCells()){
            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
                    +new String(CellUtil.cloneFamily(cell))+"\t"
                    +new String(CellUtil.cloneQualifier(cell))+"\t"
                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
                    +cell.getTimestamp());
        }
    }
    ht.close();
}

HBase Filter及对应Shell

hbase(main):032:0> scan 'users',{FILTER=>"SingleColumnValueFilter('info','age',=,'regexstring:.4')"}
ROW                                 COLUMN+CELL                                                                                         
 xiaoming01                         column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD                      
 xiaoming01                         column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD                     
 xiaoming01                         column=info:age, timestamp=1441998917568, value=24                                                  
 xiaoming02                         column=info:age, timestamp=1441998917594, value=24                                                  
 xiaoming03                         column=info:age, timestamp=1441998919607, value=24                                                  
3 row(s) in 0.0130 seconds

HBase Filter及对应Shell

HBase Filter及对应Shell

HBase Filter及对应Shell

//SingleColumnValueFilter例子2 -- SubstringComparator
private static void scanFilter03() throws IOException,
        UnsupportedEncodingException {
    Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
    conf.set("hbase.zookeeper.quorum", "ncst");
    HTable ht = new HTable(conf, "users");
    
    //检测一个子串是否存在于值中(大小写不敏感) -- SubstringComparator
    //过滤age值中包含'4'的RowKey
    SubstringComparator comparator = new SubstringComparator("4");
    //第四个参数不一样
    SingleColumnValueFilter scvf = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.EQUAL, comparator);
    Scan scan = new Scan();
    scan.setFilter(scvf);
    ResultScanner rs = ht.getScanner(scan);
    for(Result result : rs){
        for(Cell cell : result.rawCells()){
            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
                    +new String(CellUtil.cloneFamily(cell))+"\t"
                    +new String(CellUtil.cloneQualifier(cell))+"\t"
                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
                    +cell.getTimestamp());
        }
    }
    ht.close();
}

HBase Filter及对应Shell

hbase(main):033:0> scan 'users',{FILTER=>"SingleColumnValueFilter('info','age',=,'substring:4')"}
ROW                                 COLUMN+CELL                                                                                         
 xiaoming01                         column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD                      
 xiaoming01                         column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD                     
 xiaoming01                         column=info:age, timestamp=1441998917568, value=24                                                  
 xiaoming02                         column=info:age, timestamp=1441998917594, value=24                                                  
 xiaoming03                         column=info:age, timestamp=1441998919607, value=24                                                  
3 row(s) in 0.0180 seconds

HBase Filter及对应Shell

HBase Filter及对应Shell

3.列名过滤器
由于HBase采用键值对保存内部数据,列名过滤器过滤一行的列名(ColumnFamily:Qualifiers)是否存在 , 对应前节所述列值的情况。

3.1.基于Columun Family列族过滤数据的FamilyFilter
FamilyFilter(CompareFilter.CompareOp familyCompareOp, ByteArrayComparable familyComparator)

注意:
1.如果希望查找的是一个已知的列族,则使用 scan.addFamily(family); 比使用过滤器效率更高.
2.由于目前HBase对多列族支持不完善,所以该过滤器目前用途不大.

HBase Filter及对应Shell

//基于列族过滤数据的FamilyFilter
private static void scanFilter04() throws IOException,
        UnsupportedEncodingException {
    Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
    conf.set("hbase.zookeeper.quorum", "ncst");
    HTable ht = new HTable(conf, "users");

    //过滤 = 'address'的列族
    //FamilyFilter familyFilter = new FamilyFilter(CompareOp.EQUAL, new BinaryComparator("address".getBytes()));
    
    //过滤以'add'开头的列族
    FamilyFilter familyFilter = new FamilyFilter(CompareOp.EQUAL, new BinaryPrefixComparator("add".getBytes()));
    
    Scan scan = new Scan();
    scan.setFilter(familyFilter);
    ResultScanner rs = ht.getScanner(scan);
    for(Result result : rs){
        for(Cell cell : result.rawCells()){
            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
                    +new String(CellUtil.cloneFamily(cell))+"\t"
                    +new String(CellUtil.cloneQualifier(cell))+"\t"
                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
                    +cell.getTimestamp());
        }
    }
    ht.close();
}

HBase Filter及对应Shell

hbase(main):021:0> scan 'users',{FILTER=>"FamilyFilter(=,'binaryprefix:add')"}
ROW                                 COLUMN+CELL                                                                                         
 xiaoming                           column=address:city, timestamp=1441997498965, value=hangzhou                                        
 xiaoming                           column=address:contry, timestamp=1441997498911, value=china                                         
 xiaoming                           column=address:province, timestamp=1441997498939, value=zhejiang                                    
 xiaoming01                         column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD                      
 xiaoming01                         column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD                     
 zhangyifei                         column=address:city, timestamp=1441997499108, value=jieyang                                         
 zhangyifei                         column=address:contry, timestamp=1441997499077, value=china                                         
 zhangyifei                         column=address:province, timestamp=1441997499093, value=guangdong                                   
 zhangyifei                         column=address:town, timestamp=1441997500711, value=xianqiao                                        
3 row(s) in 0.0400 seconds

HBase Filter及对应Shell

HBase Filter及对应Shell

3.2.基于Qualifier列名过滤数据的QualifierFilter
QualifierFilter(CompareFilter.CompareOp op, ByteArrayComparable qualifierComparator)

说明:该过滤器应该比FamilyFilter更常用!

HBase Filter及对应Shell

//基于Qualifier(列名)过滤数据的QualifierFilter
private static void scanFilter05() throws IOException,
        UnsupportedEncodingException {
    Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
    conf.set("hbase.zookeeper.quorum", "ncst");
    HTable ht = new HTable(conf, "users");
    
    //过滤列名 = 'age'所有RowKey
    //QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new BinaryComparator("age".getBytes()));
    
    //过滤列名  以'age'开头 所有RowKey(包含age)
    //QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new BinaryPrefixComparator("age".getBytes()));
    
    //过滤列名  包含'age' 所有RowKey(包含age)
    //QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new SubstringComparator("age"));
    
    //过滤列名  符合'.ge'正则表达式 所有RowKey
    QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new RegexStringComparator(".ge"));
    
    Scan scan = new Scan();
    scan.setFilter(qualifierFilter);
    ResultScanner rs = ht.getScanner(scan);
    for(Result result : rs){
        for(Cell cell : result.rawCells()){
            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
                    +new String(CellUtil.cloneFamily(cell))+"\t"
                    +new String(CellUtil.cloneQualifier(cell))+"\t"
                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
                    +cell.getTimestamp());
        }
    }
    ht.close();
}

HBase Filter及对应Shell

hbase(main):020:0> scan 'users',{FILTER=>"QualifierFilter(=,'regexstring:.ge')"}
ROW                                 COLUMN+CELL                                                                                         
 xiaoming                           column=info:age, timestamp=1441997971945, value=38                                                  
 xiaoming01                         column=info:age, timestamp=1441998917568, value=24                                                  
 xiaoming02                         column=info:age, timestamp=1441998917594, value=24                                                  
 xiaoming03                         column=info:age, timestamp=1441998919607, value=24                                                  
 zhangyifei                         column=info:age, timestamp=1442247255446, value=18                                                  
5 row(s) in 0.0460 seconds

HBase Filter及对应Shell

HBase Filter及对应Shell

3.3.基于列名前缀过滤数据的ColumnPrefixFilter(该功能用QualifierFilter也能实现)
ColumnPrefixFilter(byte[] prefix) 
注意:一个列名是可以出现在多个列族中的,该过滤器将返回所有列族中匹配的列。

HBase Filter及对应Shell

//ColumnPrefixFilter例子
private static void scanFilter06() throws IOException,
        UnsupportedEncodingException {
    Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
    conf.set("hbase.zookeeper.quorum", "ncst");
    HTable ht = new HTable(conf, "users");
    
    //匹配 以'ag'开头的所有的列
    ColumnPrefixFilter columnPrefixFilter = new ColumnPrefixFilter("ag".getBytes());
            
    Scan scan = new Scan();
    scan.setFilter(columnPrefixFilter);
    ResultScanner rs = ht.getScanner(scan);
    for(Result result : rs){
        for(Cell cell : result.rawCells()){
            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
                    +new String(CellUtil.cloneFamily(cell))+"\t"
                    +new String(CellUtil.cloneQualifier(cell))+"\t"
                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
                    +cell.getTimestamp());
        }
    }
    ht.close();
}

HBase Filter及对应Shell

hbase(main):018:0> scan 'users',{FILTER=>"ColumnPrefixFilter('ag')"}
ROW                                 COLUMN+CELL                                                                                         
 xiaoming                           column=info:age, timestamp=1441997971945, value=38                                                  
 xiaoming01                         column=info:age, timestamp=1441998917568, value=24                                                  
 xiaoming02                         column=info:age, timestamp=1441998917594, value=24                                                  
 xiaoming03                         column=info:age, timestamp=1441998919607, value=24                                                  
 zhangyifei                         column=info:age, timestamp=1442247255446, value=18                                                  
5 row(s) in 0.0280 seconds

HBase Filter及对应Shell

HBase Filter及对应Shell

3.4.基于多个列名前缀过滤数据的MultipleColumnPrefixFilter
MultipleColumnPrefixFilter 和 ColumnPrefixFilter 行为差不多,但可以指定多个前缀。

HBase Filter及对应Shell

//MultipleColumnPrefixFilter例子
private static void scanFilter07() throws IOException,
        UnsupportedEncodingException {
    Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
    conf.set("hbase.zookeeper.quorum", "ncst");
    HTable ht = new HTable(conf, "users");

    //匹配 以'a'或者'c'开头 所有的列{二维数组}
    byte[][] prefixes =new byte[][]{"a".getBytes(), "c".getBytes()};        
     MultipleColumnPrefixFilter multipleColumnPrefixFilter = new MultipleColumnPrefixFilter(prefixes );

    Scan scan = new Scan();
    scan.setFilter(multipleColumnPrefixFilter);
    ResultScanner rs = ht.getScanner(scan);
    for(Result result : rs){
        for(Cell cell : result.rawCells()){
            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
                    +new String(CellUtil.cloneFamily(cell))+"\t"
                    +new String(CellUtil.cloneQualifier(cell))+"\t"
                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
                    +cell.getTimestamp());
        }
    }
    ht.close();
}

HBase Filter及对应Shell

hbase(main):017:0> scan 'users',{FILTER=>"MultipleColumnPrefixFilter('a','c')"}
ROW                                 COLUMN+CELL                                                                                         
 xiaoming                           column=address:city, timestamp=1441997498965, value=hangzhou                                        
 xiaoming                           column=address:contry, timestamp=1441997498911, value=china                                         
 xiaoming                           column=info:age, timestamp=1441997971945, value=38                                                  
 xiaoming                           column=info:company, timestamp=1441997498889, value=alibaba                                         
 xiaoming01                         column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD                      
 xiaoming01                         column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD                     
 xiaoming01                         column=info:age, timestamp=1441998917568, value=24                                                  
 xiaoming02                         column=info:age, timestamp=1441998917594, value=24                                                  
 xiaoming03                         column=info:age, timestamp=1441998919607, value=24                                                  
 zhangyifei                         column=address:city, timestamp=1441997499108, value=jieyang                                         
 zhangyifei                         column=address:contry, timestamp=1441997499077, value=china                                         
 zhangyifei                         column=info:age, timestamp=1442247255446, value=18                                                  
 zhangyifei                         column=info:company, timestamp=1441997499039, value=alibaba                                         
5 row(s) in 0.0430 seconds

HBase Filter及对应Shell

HBase Filter及对应Shell

3.5.基于列范围(不是行范围)过滤数据ColumnRangeFilter

  1. 可用于获得一个范围的列,例如,如果你的一行中有百万个列,但是你只希望查看列名从bbbb到dddd的范围
  2. 该方法从 HBase 0.92 版本开始引入
  3. 一个列名是可以出现在多个列族中的,该过滤器将返回所有列族中匹配的列

构造函数:
ColumnRangeFilter(byte[] minColumn, boolean minColumnInclusive, byte[] maxColumn, boolean maxColumnInclusive)
参数解释:

  • minColumn - 列范围的最小值,如果为空,则没有下限
  • minColumnInclusive - 列范围是否包含minColumn
  • maxColumn - 列范围最大值,如果为空,则没有上限
  • maxColumnInclusive - 列范围是否包含maxColumn

HBase Filter及对应Shell

//ColumnRangeFilter例子
private static void scanFilter08() throws IOException,
UnsupportedEncodingException {
    Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
    conf.set("hbase.zookeeper.quorum", "ncst");
    HTable ht = new HTable(conf, "users");

    //匹配 以'a'开头到以'c'开头(不包含c) 所有的列    
    ColumnRangeFilter columnRangeFilter = new ColumnRangeFilter("a".getBytes(), true, "c".getBytes(), false);

    Scan scan = new Scan();
    scan.setFilter(columnRangeFilter);
    ResultScanner rs = ht.getScanner(scan);
    for(Result result : rs){
        for(Cell cell : result.rawCells()){
            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
                    +new String(CellUtil.cloneFamily(cell))+"\t"
                    +new String(CellUtil.cloneQualifier(cell))+"\t"
                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
                    +cell.getTimestamp());
        }
    }
    ht.close();
}

HBase Filter及对应Shell

hbase(main):016:0> scan 'users',{FILTER=>"ColumnRangeFilter('a',true,'c',false)"}
ROW                                 COLUMN+CELL                                                                                         
 xiaoming                           column=info:age, timestamp=1441997971945, value=38                                                  
 xiaoming                           column=info:birthday, timestamp=1441997498851, value=1987-06-17                                     
 xiaoming01                         column=info:age, timestamp=1441998917568, value=24                                                  
 xiaoming02                         column=info:age, timestamp=1441998917594, value=24                                                  
 xiaoming03                         column=info:age, timestamp=1441998919607, value=24                                                  
 zhangyifei                         column=info:age, timestamp=1442247255446, value=18                                                  
 zhangyifei                         column=info:birthday, timestamp=1441997498990, value=1987-4-17                                      
5 row(s) in 0.0340 seconds

HBase Filter及对应Shell

HBase Filter及对应Shell

4.RowKey
当需要根据行键特征查找一个范围的行数据时,使用Scan的startRow和stopRow会更高效,但是,startRow和stopRow只能匹配行键的开始字符,而不能匹配中间包含的字符。当需要针对行键进行更复杂的过滤时,可以使用RowFilter。
构造函数:RowFilter(CompareFilter.CompareOp rowCompareOp, ByteArrayComparable rowComparator)

HBase Filter及对应Shell

//RowFilter例子
private static void scanFilter09() throws IOException,
        UnsupportedEncodingException {
    Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
    conf.set("hbase.zookeeper.quorum", "ncst");
    HTable ht = new HTable(conf, "users");

    //匹配 行键包含'01' 所有的行    
    RowFilter rowFilter = new RowFilter(CompareOp.EQUAL, new SubstringComparator("01"));
    
    Scan scan = new Scan();
    scan.setFilter(rowFilter);
    ResultScanner rs = ht.getScanner(scan);
    for(Result result : rs){
        for(Cell cell : result.rawCells()){
            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
                    +new String(CellUtil.cloneFamily(cell))+"\t"
                    +new String(CellUtil.cloneQualifier(cell))+"\t"
                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
                    +cell.getTimestamp());
        }
    }
    ht.close();
}

HBase Filter及对应Shell

hbase(main):013:0> scan 'users',{FILTER=>"RowFilter(=,'substring:01')"}
ROW                                 COLUMN+CELL                                                                                         
 xiaoming01                         column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD                      
 xiaoming01                         column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD                     
 xiaoming01                         column=info:age, timestamp=1441998917568, value=24                                                  
1 row(s) in 0.0190 seconds

HBase Filter及对应Shell

HBase Filter及对应Shell

5.PageFilter(Shell不支持?)
指定页面行数,返回对应行数的结果集。
需要注意的是,该过滤器并不能保证返回的结果行数小于等于指定的页面行数,因为过滤器是分别作用到各个region server的,它只能保证当前region返回的结果行数不超过指定页面行数。
构造函数:PageFilter(long pageSize)

HBase Filter及对应Shell

//PageFilter例子
private static void scanFilter10() throws IOException,
        UnsupportedEncodingException {
    Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
    conf.set("hbase.zookeeper.quorum", "ncst");
    HTable ht = new HTable(conf, "users");

    //从RowKey为 "xiaoming" 开始,取3行(包含xiaoming)    
    PageFilter pageFilter = new PageFilter(3L);
    
    Scan scan = new Scan();
    scan.setStartRow("xiaoming".getBytes());
    scan.setFilter(pageFilter);
    ResultScanner rs = ht.getScanner(scan);
    for(Result result : rs){
        for(Cell cell : result.rawCells()){
            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
                    +new String(CellUtil.cloneFamily(cell))+"\t"
                    +new String(CellUtil.cloneQualifier(cell))+"\t"
                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
                    +cell.getTimestamp());
        }
    }
    ht.close();
}

HBase Filter及对应Shell

注意:由于该过滤器并不能保证返回的结果行数小于等于指定的页面行数,所以更好的返回指定行数的办法是ResultScanner.next(int nbRows),即:

HBase Filter及对应Shell

//上面Demo的改动版private static void scanFilter11() throws IOException,
        UnsupportedEncodingException {
    Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
    conf.set("hbase.zookeeper.quorum", "ncst");
    HTable ht = new HTable(conf, "users");
   
    //从RowKey为 "xiaoming" 开始,取3行(包含xiaoming)    
    //PageFilter pageFilter = new PageFilter(3L);
    
    Scan scan = new Scan();
    scan.setStartRow("xiaoming".getBytes());
    //scan.setFilter(pageFilter);
    ResultScanner rs = ht.getScanner(scan);
    //指定返回3行数据
    for(Result result : rs.next(3)){
        for(Cell cell : result.rawCells()){
            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
                    +new String(CellUtil.cloneFamily(cell))+"\t"
                    +new String(CellUtil.cloneQualifier(cell))+"\t"
                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
                    +cell.getTimestamp());
        }
    }
    ht.close();
}

HBase Filter及对应Shell

6.SkipFilter(Shell不支持)
根据整行中的每个列来做过滤,只要存在一列不满足条件,整行都被过滤掉。
构造函数:SkipFilter(Filter filter)

例如,如果一行中的所有列代表的是不同物品的重量,则真实场景下这些数值都必须大于零,我们希望将那些包含任意列值为0的行都过滤掉。在这个情况下,我们结合ValueFilter和SkipFilter共同实现该目的:
scan.setFilter(new SkipFilter(new ValueFilter(CompareOp.NOT_EQUAL,new BinaryComparator(Bytes.toBytes(0))));

HBase Filter及对应Shell

//SkipFilter例子
private static void scanFilter12() throws IOException,
        UnsupportedEncodingException {
    Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
    conf.set("hbase.zookeeper.quorum", "ncst");
    HTable ht = new HTable(conf, "users");
    
    //跳过列值中包含"24"的所有列
    SkipFilter skipFilter = new SkipFilter(new ValueFilter(CompareOp.NOT_EQUAL, new BinaryComparator("24".getBytes())));
    
    Scan scan = new Scan();
    scan.setFilter(skipFilter);
    ResultScanner rs = ht.getScanner(scan);
    for(Result result : rs){
        for(Cell cell : result.rawCells()){
            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
                    +new String(CellUtil.cloneFamily(cell))+"\t"
                    +new String(CellUtil.cloneQualifier(cell))+"\t"
                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
                    +cell.getTimestamp());
        }
    }
    ht.close();
}

HBase Filter及对应Shell

7.Utility--FirstKeyOnlyFilter
该过滤器仅仅返回每一行中第一个cell的值,可以用于高效的执行行数统计操作。估计实战意义不大。
构造函数:public FirstKeyOnlyFilter()

HBase Filter及对应Shell

//FirstKeyOnlyFilter例子
private static void scanFilter12() throws IOException,
        UnsupportedEncodingException {
    Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
    conf.set("hbase.zookeeper.quorum", "ncst");
    HTable ht = new HTable(conf, "users");
    
    //返回每一行中的第一个cell的值
    FirstKeyOnlyFilter firstKeyOnlyFilter = new FirstKeyOnlyFilter();

    Scan scan = new Scan();
    scan.setFilter(firstKeyOnlyFilter);
    ResultScanner rs = ht.getScanner(scan);
    int i = 0;
    for(Result result : rs){
        for(Cell cell : result.rawCells()){
            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
                    +new String(CellUtil.cloneFamily(cell))+"\t"
                    +new String(CellUtil.cloneQualifier(cell))+"\t"
                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
                    +cell.getTimestamp());
            i++;
        }
    }
    //输出总的行数
    System.out.println(i);
    ht.close();
}

HBase Filter及对应Shell

hbase(main):009:0> scan 'users',{FILTER=>'FirstKeyOnlyFilter()'}
ROW                                COLUMN+CELL                                                                                         
 xiaoming                          column=address:city, timestamp=1441997498965, value=hangzhou                                        
 xiaoming01                        column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD                      
 xiaoming02                        column=info:age, timestamp=1441998917594, value=24                                                  
 xiaoming03                        column=info:age, timestamp=1441998919607, value=24                                                  
 zhangyifei                        column=address:city, timestamp=1441997499108, value=jieyang                                         
5 row(s) in 0.0240 seconds

HBase Filter及对应Shell

HBase Filter及对应Shell

点赞
收藏
评论区
推荐文章
blmius blmius
3年前
MySQL:[Err] 1292 - Incorrect datetime value: ‘0000-00-00 00:00:00‘ for column ‘CREATE_TIME‘ at row 1
文章目录问题用navicat导入数据时,报错:原因这是因为当前的MySQL不支持datetime为0的情况。解决修改sql\mode:sql\mode:SQLMode定义了MySQL应支持的SQL语法、数据校验等,这样可以更容易地在不同的环境中使用MySQL。全局s
皕杰报表之UUID
​在我们用皕杰报表工具设计填报报表时,如何在新增行里自动增加id呢?能新增整数排序id吗?目前可以在新增行里自动增加id,但只能用uuid函数增加UUID编码,不能新增整数排序id。uuid函数说明:获取一个UUID,可以在填报表中用来创建数据ID语法:uuid()或uuid(sep)参数说明:sep布尔值,生成的uuid中是否包含分隔符'',缺省为
待兔 待兔
5个月前
手写Java HashMap源码
HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程22
Jacquelyn38 Jacquelyn38
3年前
2020年前端实用代码段,为你的工作保驾护航
有空的时候,自己总结了几个代码段,在开发中也经常使用,谢谢。1、使用解构获取json数据let jsonData  id: 1,status: "OK",data: 'a', 'b';let  id, status, data: number   jsonData;console.log(id, status, number )
Wesley13 Wesley13
3年前
mysql设置时区
mysql设置时区mysql\_query("SETtime\_zone'8:00'")ordie('时区设置失败,请联系管理员!');中国在东8区所以加8方法二:selectcount(user\_id)asdevice,CONVERT\_TZ(FROM\_UNIXTIME(reg\_time),'08:00','0
Stella981 Stella981
3年前
HIVE 时间操作函数
日期函数UNIX时间戳转日期函数: from\_unixtime语法:   from\_unixtime(bigint unixtime\, string format\)返回值: string说明: 转化UNIX时间戳(从19700101 00:00:00 UTC到指定时间的秒数)到当前时区的时间格式举例:hive   selec
Wesley13 Wesley13
3年前
Java日期时间API系列36
  十二时辰,古代劳动人民把一昼夜划分成十二个时段,每一个时段叫一个时辰。二十四小时和十二时辰对照表:时辰时间24时制子时深夜11:00凌晨01:0023:0001:00丑时上午01:00上午03:0001:0003:00寅时上午03:00上午0
Wesley13 Wesley13
3年前
00:Java简单了解
浅谈Java之概述Java是SUN(StanfordUniversityNetwork),斯坦福大学网络公司)1995年推出的一门高级编程语言。Java是一种面向Internet的编程语言。随着Java技术在web方面的不断成熟,已经成为Web应用程序的首选开发语言。Java是简单易学,完全面向对象,安全可靠,与平台无关的编程语言。
Wesley13 Wesley13
3年前
MySQL部分从库上面因为大量的临时表tmp_table造成慢查询
背景描述Time:20190124T00:08:14.70572408:00User@Host:@Id:Schema:sentrymetaLast_errno:0Killed:0Query_time:0.315758Lock_
Python进阶者 Python进阶者
11个月前
Excel中这日期老是出来00:00:00,怎么用Pandas把这个去除
大家好,我是皮皮。一、前言前几天在Python白银交流群【上海新年人】问了一个Pandas数据筛选的问题。问题如下:这日期老是出来00:00:00,怎么把这个去除。二、实现过程后来【论草莓如何成为冻干莓】给了一个思路和代码如下:pd.toexcel之前把这