Elasticsearch学习总结八 ElasticSearch中的聚合操作

Stella981
• 阅读 614

首先准备数据,索引包含四个字段fieldA,fieldB,fieldC,fieldD,如下图,以下案列中都使用了基本REST命令和JavaAP两种方式实现

Elasticsearch学习总结八 ElasticSearch中的聚合操作

1). 首先按照某个字段fieldC分组统计,相当于sql 中的group by操作,

curl -XPOST "http://121.40.128.155:9200/tempindex/_search?pretty" -d '{
"size": 0,
  "aggs": {
    "fieldC_count": {
      "terms": {
        "field": "fieldC"
      }
    }
  }
}'

返回值如下:
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "fieldC_count" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "java",
        "doc_count" : 3
      }, {
        "key" : "c++",
        "doc_count" : 1
      }, {
        "key" : "ptyhone",
        "doc_count" : 1
      } ]
    }
  }
}

对应的JavaApi是如何实现的呢

EsSearchManager esSearchManager = EsSearchManager.getInstance();
SearchRequestBuilder searchReq = esSearchManager.client.prepareSearch("tempindex");
searchReq.setTypes("tempindex");
//group by 条件
TermsBuilder termsb = AggregationBuilders.terms("my_fieldC").field("fieldC").size(100);
searchReq.addAggregation(termsb);
SearchResponse searchRes = searchReq.execute().actionGet();
Terms fieldATerms = searchRes.getAggregations().get("my_fieldC");
for (Terms.Bucket filedABucket : fieldATerms.getBuckets()) {
    //fieldA
    String groupbyKey = filedABucket.getKey().toString();
    //COUNT(fieldA)
    long  countValue = filedABucket.getDocCount();
}

2).统计某一个字段的最大最小值

curl -XPOST "http://121.40.128.155:9200/tempindex/_search?pretty" -d '{
  "size": 0,
  "aggs": {
    "max_fieldA": {
      "max": {
        "field": "fieldA"
      }
    },
    "min_fieldA": {
      "min": {
        "field": "fieldA"
      }
    }
  }
}'

返回值如下:
{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "max_fieldA" : {
      "value" : 25.0
    },
    "min_fieldA" : {
      "value" : 10.0
    }
  }
}

对应的JavaApi实现如下

EsSearchManager esSearchManager = EsSearchManager.getInstance();
SearchRequestBuilder searchReq = esSearchManager.client.prepareSearch("tempindex");
searchReq.setTypes("tempindex");
MaxBuilder maxBuilder = AggregationBuilders.max("max_fieldA").field("fieldA");
searchReq.addAggregation(maxBuilder);
MinBuilder minBuilder = AggregationBuilders.min("min_fieldA").field("fieldA");
searchReq.addAggregation(minBuilder);
SearchResponse searchRes = searchReq.execute().actionGet();
InternalMax internalMax=searchRes.getAggregations().get("max_fieldA");
System.out.println(internalMax.getName() +"="+ internalMax.getValue());
InternalMin internalMin=searchRes.getAggregations().get("min_fieldA");
System.out.println(internalMin.getName() +"="+ internalMin.getValue());

3).Average平均值 按照某个字段求平均值

curl -XPOST "http://121.40.128.155:9200/tempindex/_search?pretty" -d '{
"size": 0,
"aggs": {
    "per_count": {
        "terms": {
            "field": "fieldC"
        },
        "aggs": {
            "avg_fieldB": {
                "avg": {
                    "field": "fieldB"
                }
            }
        }
    }
}
}'    

返回值如下:
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "per_count" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "java",
        "doc_count" : 3,
        "avg_fieldB" : {
          "value" : 20.0
        }
      }, {
        "key" : "c++",
        "doc_count" : 1,
        "avg_fieldB" : {
          "value" : 35.0
        }
      }, {
        "key" : "ptyhone",
        "doc_count" : 1,
        "avg_fieldB" : {
          "value" : 25.0
        }
      } ]
    }
  }
}

对应的JavaApi实现如下

EsSearchManager esSearchManager = EsSearchManager.getInstance();
SearchRequestBuilder searchReq = esSearchManager.client.prepareSearch("tempindex");
searchReq.setTypes("tempindex");
TermsBuilder termsb = AggregationBuilders.terms("my_fieldC").field("fieldC").size(100);
termsb.subAggregation(AggregationBuilders.avg("my_avg_fieldB").field("fieldB"));
searchReq.setQuery(QueryBuilders.matchAllQuery()).addAggregation(termsb);
SearchResponse searchRes = searchReq.execute().actionGet();
Terms fieldATerms = searchRes.getAggregations().get("my_fieldC");
for (Terms.Bucket filedABucket : fieldATerms.getBuckets()) {
    String fieldAValue = filedABucket.getKey().toString();
    long fieldACount = filedABucket.getDocCount();
    Avg avgagg = filedABucket.getAggregations().get("my_avg_fieldB");
    double avgFieldB = avgagg.getValue();
    System.out.println("fieldAValue="+fieldAValue);
    System.out.println("fieldACount="+fieldACount);
    System.out.println("avgFieldB="+avgFieldB);
}

4).Sum求和,求某个字段的sum之和

curl -XPOST "http://121.40.128.155:9200/tempindex/_search?pretty" -d '{
"size": 0,
"aggs": {
    "per_count": {
        "terms": {
            "field": "fieldC"
        },
        "aggs": {
            "sum_fieldB": {
                "sum": {
                    "field": "fieldB"
                }
            }
        }
    }
}
}'
返回值如下:
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "per_count" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "java",
        "doc_count" : 3,
        "sum_fieldB" : {
          "value" : 60.0
        }
      }, {
        "key" : "c++",
        "doc_count" : 1,
        "sum_fieldB" : {
          "value" : 35.0
        }
      }, {
        "key" : "ptyhone",
        "doc_count" : 1,
        "sum_fieldB" : {
          "value" : 25.0
        }
      } ]
    }
  }

对应的Java代码如下

EsSearchManager esSearchManager = EsSearchManager.getInstance();
SearchRequestBuilder searchReq = esSearchManager.client.prepareSearch("tempindex");
searchReq.setTypes("tempindex");
TermsBuilder termsb = AggregationBuilders.terms("my_fieldC").field("fieldC").size(100);
termsb.subAggregation(AggregationBuilders.sum("my_sum_fieldB").field("fieldB"));
searchReq.setQuery(QueryBuilders.matchAllQuery()).addAggregation(termsb);
SearchResponse searchRes = searchReq.execute().actionGet();
Terms fieldATerms = searchRes.getAggregations().get("my_fieldC");
for (Terms.Bucket filedABucket : fieldATerms.getBuckets()) {
    String fieldAValue = filedABucket.getKey().toString();
    long fieldACount = filedABucket.getDocCount();
    Avg avgagg = filedABucket.getAggregations().get("my_sum_fieldB");
    double avgFieldB = avgagg.getValue();
    System.out.println("fieldAValue="+fieldAValue);
    System.out.println("fieldACount="+fieldACount);
    System.out.println("avgFieldB="+avgFieldB);
}

以上总结了部分es基本操作API和调用demo,详细的代码请查看github地址 https://github.com/winstonelei/BigDataTools

更多资料请查询官网api https://www.elastic.co/guide/en/elasticsearch/client/java-api/2.3/_bucket_aggregations.html

点赞
收藏
评论区
推荐文章
blmius blmius
3年前
MySQL:[Err] 1292 - Incorrect datetime value: ‘0000-00-00 00:00:00‘ for column ‘CREATE_TIME‘ at row 1
文章目录问题用navicat导入数据时,报错:原因这是因为当前的MySQL不支持datetime为0的情况。解决修改sql\mode:sql\mode:SQLMode定义了MySQL应支持的SQL语法、数据校验等,这样可以更容易地在不同的环境中使用MySQL。全局s
皕杰报表之UUID
​在我们用皕杰报表工具设计填报报表时,如何在新增行里自动增加id呢?能新增整数排序id吗?目前可以在新增行里自动增加id,但只能用uuid函数增加UUID编码,不能新增整数排序id。uuid函数说明:获取一个UUID,可以在填报表中用来创建数据ID语法:uuid()或uuid(sep)参数说明:sep布尔值,生成的uuid中是否包含分隔符'',缺省为
待兔 待兔
6个月前
手写Java HashMap源码
HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程22
Jacquelyn38 Jacquelyn38
3年前
2020年前端实用代码段,为你的工作保驾护航
有空的时候,自己总结了几个代码段,在开发中也经常使用,谢谢。1、使用解构获取json数据let jsonData  id: 1,status: "OK",data: 'a', 'b';let  id, status, data: number   jsonData;console.log(id, status, number )
Easter79 Easter79
3年前
The application does not contain a valid bundle identifier.解决方法
博主是在删除某第三方库后报此错误①首先Targetinfo.plist文件中bundleidentifier是否正确!输入图片说明(https://static.oschina.net/uploads/img/201704/12141820_zuX1.png"在这里输入图片标题")②如果配置都正确清理Xcode缓存文件点击小箭头直接删
Stella981 Stella981
3年前
Django中Admin中的一些参数配置
设置在列表中显示的字段,id为django模型默认的主键list_display('id','name','sex','profession','email','qq','phone','status','create_time')设置在列表可编辑字段list_editable
Stella981 Stella981
3年前
Alamofire4.x开源代码分析(一)使用方法
!输入图片说明(https://static.oschina.net/uploads/img/201706/28090437_aIT1.png"在这里输入图片标题")本着了解框架的实现思路和学习Swift的目的开启本系列的博客.本系列参考Alamofire(https://www.oschina.net/action/GoToLink?urlh
Wesley13 Wesley13
3年前
MySQL部分从库上面因为大量的临时表tmp_table造成慢查询
背景描述Time:20190124T00:08:14.70572408:00User@Host:@Id:Schema:sentrymetaLast_errno:0Killed:0Query_time:0.315758Lock_
Stella981 Stella981
3年前
ELK学习笔记之ElasticSearch的索引详解
0x00ElasticSearch的索引和MySQL的索引方式对比Elasticsearch是通过Lucene的倒排索引技术实现比关系型数据库更快的过滤。特别是它对多条件的过滤支持非常好,比如年龄在18和30之间,性别为女性这样的组合查询。倒排索引很多地方都有介绍,但是其比关系型
Python进阶者 Python进阶者
1年前
Excel中这日期老是出来00:00:00,怎么用Pandas把这个去除
大家好,我是皮皮。一、前言前几天在Python白银交流群【上海新年人】问了一个Pandas数据筛选的问题。问题如下:这日期老是出来00:00:00,怎么把这个去除。二、实现过程后来【论草莓如何成为冻干莓】给了一个思路和代码如下:pd.toexcel之前把这