一. 聚合查询分类:
聚合方式
说明
Metric Aggregation(指标聚合)
一些数学计算,可以对文档字段统计分析
Bucket Aggregation (桶聚合)
一些满足特定条件的文档的集合
Pipeline Aggregation(管道聚合)
对其他的聚合结果进行二次聚合
Metrix Aggregation(矩阵聚合)
支持对多个字段的操作并提供一个结果矩阵
查询请求体语法:
“aggs" : {
“{aggs_name}" : { //聚合的名字 “{aggs_type}" : { //聚合的类型 {aggs_body} //聚合体:对哪些字段进行聚合 }
[,“aggs” : { [<sub_aggs>]+ } ] //可定义子聚合
}
[,“<aggs_name_2>” : { ... } ] //可定义多个同级聚合
}
二. ES聚合
1.指标查询:
- max,min,sum,avg,count,distinct
- 统计聚合,拓展的统计聚合:
- 百分比聚合,百分比排名聚合:
- 最高匹配聚合,
- 地理边界聚合 & 地理重心聚合,针对geo-point类型字段
示例 -max,min,sum,avg,count,distinct :
最大值:
{
"size":0,
"aggs":{
"flowAmount_max":{
"max":{
"field":"age"
}
}
}
}
count(字段有值的数量):
{
"size":0,
"aggs":{
"users_count":{
"value_count":{
"field":"username"
}
}
}
}
去重:
{
"size":0,
"aggs":{
"labels_cardinality":{
"cardinality":{
"field":"labels"
}
}
}
}
示例 - 统计聚合,拓展的统计聚合:
统计聚合(count,min,max,avg,sum):
{
"size":0,
"aggs":{
"flowAmount_stats":{
"stats":{
"field":"amount"
}
}
}
}
拓展统计聚合(在stats基础上加了平方和,方差,标准差等):
{
"size":0,
"aggs":{
"flowAmount_extended_stats":{
"extended_stats":{
"field":"amount"
}
}
}
}
示例 - 百分比聚合,百分比排名聚合:
百分比聚合:
{
"size":0,
"aggs":{
“createTime_percentiles":{
"percentiles":{
"field":"createTime",
"percents": [
1,
5,
25,
50,
75,
95,
99,
99.9
]
}
}
}
}
百分比排名聚合:
{
"size":0,
"aggs":{
“flowAmount_percentiles_ranks":{
"percentile_ranks":{
"field":"salesAmount",
"values": [
5,
10
]
}
}
}
}
示例 - 最高匹配聚合:
最高匹配聚合:
{
"size": 0,
"aggs": {
"top_tags": {
"terms": {
"field": "flowAmount",
"size": 10
},
"aggs": {
"flowAmount_top_hits": {
"top_hits": {
"sort": [
{
"id": {
"order": "desc"
}
}
],
"_source": {
"includes":
[
"id", "state", "flowAmount", "createTime"
]
},
"size" : 2
}
}
}
}
}
}
2. 桶聚合:
- Terms –词聚合
- Filter – 过滤聚合
- Filters – 多过滤聚合
- Range – 范围聚合
- Date Range – 时间范围聚合
- Histogram – 时间柱状聚合
- Missing – 缺省值聚合
- IP Range – IP 范围聚合(IPV4)
- Nested – 嵌套聚合
示例 - Terms –词聚合,Filter – 过滤聚合,Filters – 多过滤聚合:
Terms过滤:
{
"size": 0,
"aggs": {
"labels_terms" : {
"terms" : {
"field" : "username",
"size" : 10,
"order": {
"_count": "desc"
},
"min_doc_count": 300
}
}
}
}
Filter过滤:
{
"size": 0,
"aggs": {
"responseTime_filter": {
"filter": {
"match":{
"state":"FINISHED"
}
},
"aggs": {
"responseTime_avg": {
"avg": {
"field": "resolvedTime"
}
}
}
}
}
}
多过滤聚合:
{
"size": 0,
"aggs": {
"messages": {
"filters": {
"filters": {
"state_match": {
"match": {
"state": "FINISHED"
}
},
"lables_match": {
"match": {
"lables": "TEST_ISSUE"
}
}
}
}
}
}
}
示例 - Range – 范围聚合,Date Range – 时间范围聚合:
范围聚合:
{
"size": 0,
"aggs": {
"age_range": {
"range": {
"field": "flowAmount",
"ranges": [
{
"to": 1
},
{
"from": 1,
"to": 5
},
{
"from": 5,
"to": 10
},
{
"from": 10
}
]
},
"aggs": {
"createTime_max": {
"max": {
"field": "createTime"
}
}
}
}
}
}
时间范围聚合(基于日期类型):
{
"size":0,
"aggregations":{
"splitCreateTime":{
"date_range":{
"field":“dateTime",
"format":"yyyy-MM-dd",
"time_zone": "+08:00",
"interval": "day",
"ranges":[
{
"from":2020-10-14,
"to":2020-10-15
},
{
"from":2020-10-15,
"to":2020-10-16
},
{
"from":2020-10-16,
"to":2020-10-17
}
]
}
}
示例 - Histogram – 时间柱状聚合,Missing – 缺省值聚合:
柱状图:
{
"size": 0,
"aggs" : {
"flowAmount_hisgogram" : {
"histogram" : {
"field" : "flowAmount",
"interval" : 5,
"min_doc_count" : 1,
"extended_bounds" : {
"min" : 0,
"max" : 50
},
"order" : { "_count" : "desc" },
"keyed":true,
"missing":0
}
}
}
}
日期直方图(基于日期类型);
{
"size": 0,
"aggs" : {
"dateTime_histogram" : {
"date_histogram" : {
"field" : "dateTime",
"interval" : "day",
"format" : "yyyy-MM-dd",
"time_zone": "+08:00"
}
}
}
}
缺省值聚合:
{
"size": 0,
"aggs" : {
"resolvedTime_missing" : {
"missing" : {
"field" : "resolvedTime"
}
}
}
}
示例 - IP Range – IP 范围聚合(IPV4),Nested – 嵌套聚合:
IPV4范围聚合(基于IPv4 数据类型):
{
"size": 0,
"aggs" : {
"ipv4_ranges" : {
"ip_range" : {
"field" : "ipv4",
"ranges" : [
{ "to" : "10.0.0.10" },
{ "from" : "10.0.0.10" }
]
}
}
}
}
嵌套聚合(基于nested类型);
{
"size": 0,
"aggs": {
"age_range": {
"range": {
"field": "historyLog.operateTime",
"ranges": [
{
"to": 1602259200000
},
{
"from": 1602259200000,
"to": 1603123200000
},
{
"from": 1603123200000
}
]
}
}
}
}
3.管道聚合:
处理来自其他聚合而不是文档集的输出,将信息添加到输出中。bucket_path指定请求指标的路径
示例 - 管道聚合:
{
"size":0,
"aggs":{
"flowAmount_range":{
"range":{
"field":"createTime",
"ranges":[
{
"to": 1602259200000
},
{
"from": 1602259200000,
"to": 1603123200000
},
{
"from": 1603123200000,
"to": 1603987200000
},
{
"from": 1603987200000
}
]
},
"aggs":{
"resolvedTime_sum":{
"sum":{
"field":"resolvedTime"
}
}
}
},
"resolvedTime_avg":{
"avg_bucket":{
"buckets_path":"flowAmount_range>resolvedTime_sum"
}
}
}
}
三. 使用注意事项:
- 查询时,可根据具体业务场景过滤字段,提高效率,可使用fetchSource
- 历史数据批量写入时,配置refresh_interval参数,减少刷新频率(memory->doc)。
- 查询时,可通过配置max_result_window控制最大查询数量
- 使用scroll时,可配置max_open_scroll_context控制scrollId的容量,每次scroll完需要clearScroll
- 原有ES索引新增字段时,切记先给索引加好字段再写数据