es聚合查询
ES提供了强大的聚合分析功能,按照操作上细化,可以主要分为四种,如下表所示:
聚合方式 解释
Bucket Aggregation 一些满足特定条件的文档的集合
Metric Aggregation 一些数学计算,可以对文档字段统计分析
Pipeline Aggregation 对其他的聚合结果进行二次聚合
Metrix Aggregation 支持对多个字段的操作并提供一个结果矩阵
这些只是理论意义上的细化,在实际的应用过程中,我们并没有说针对那种场景使用那种聚合分析。都是为了满足我们的业务,在实现的过程中同时会使用到多种聚合的方式。
一、四种聚合方式
1.1 Bucket(分桶)
分桶就是将具有某一类共同特征的数据归为一类,然后求其总数,例如: 男女、公司同一工作岗位的员工、商品高中低档等。在对数据分桶后还可以进一步分桶,例如:0~ 20岁男性、21~50岁男性、50岁以上男性;同一工作岗位男性、女性;高档商品好评、中评、差评的商品。
1.2、Metric(计算)
计算具有一类特征的数据的统计值,例如平均值、最大值、最小值等。
1.3、Pipeline(管道)
pipeline与Linux操作系统中的管道操作(将上一步操作的结果作为下一步操作的数据源)类似。即将上一次聚合操作的结果作为下一次聚合操作的数据源。
1.4、Metrix(矩阵)
矩阵就是同时可以支持多值的输出,例如对分桶的数据同时求平均、最大、最小值;
二、具体的案例
首先在ES中插入一批的测试数据,在插入测试数据之前先定义mapping。
2.1、mapping的定义
PUT employee
{
"mappings": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "keyword"
},
"job": {
"type": "keyword"
},
"age": {
"type": "integer"
},
"gender": {
"type": "keyword"
}
}
}
}
2.2、插入数据
PUT employee/_bulk
{"index": {"_id": 1}}
{"id": 1, "name": "Bob", "job": "java", "age": 21, "sal": 8000, "gender": "male"}
{"index": {"_id": 2}}
{"id": 2, "name": "Rod", "job": "html", "age": 31, "sal": 18000, "gender": "female"}
{"index": {"_id": 3}}
{"id": 3, "name": "Gaving", "job": "java", "age": 24, "sal": 12000, "gender": "male"}
{"index": {"_id": 4}}
{"id": 4, "name": "King", "job": "dba", "age": 26, "sal": 15000, "gender": "female"}
{"index": {"_id": 5}}
{"id": 5, "name": "Jonhson", "job": "dba", "age": 29, "sal": 16000, "gender": "male"}
{"index": {"_id": 6}}
{"id": 6, "name": "Douge", "job": "java", "age": 41, "sal": 20000, "gender": "female"}
{"index": {"_id": 7}}
{"id": 7, "name": "cutting", "job": "dba", "age": 27, "sal": 7000, "gender": "male"}
{"index": {"_id": 8}}
{"id": 8, "name": "Bona", "job": "html", "age": 22, "sal": 14000, "gender": "female"}
{"index": {"_id": 9}}
{"id": 9, "name": "Shyon", "job": "dba", "age": 20, "sal": 19000, "gender": "female"}
{"index": {"_id": 10}}
{"id": 10, "name": "James", "job": "html", "age": 18, "sal": 22000, "gender": "male"}
{"index": {"_id": 11}}
{"id": 11, "name": "Golsling", "job": "java", "age": 32, "sal": 23000, "gender": "female"}
{"index": {"_id": 12}}
{"id": 12, "name": "Lily", "job": "java", "age": 24, "sal": 2000, "gender": "male"}
{"index": {"_id": 13}}
{"id": 13, "name": "Jack", "job": "html", "age": 23, "sal": 3000, "gender": "female"}
{"index": {"_id": 14}}
{"id": 14, "name": "Rose", "job": "java", "age": 36, "sal": 6000, "gender": "female"}
{"index": {"_id": 15}}
{"id": 15, "name": "Will", "job": "dba", "age": 38, "sal": 4500, "gender": "male"}
{"index": {"_id": 16}}
{"id": 16, "name": "smith", "job": "java", "age": 32, "sal": 23000, "gender": "male"}
数据说明:插入的数据为员工信息,name是员工的姓名,job是员工的工种,age为员工的年龄,sal为员工的薪水,gender为员工的性别。
2.3、聚合查询
查询工种的数量
GET employee/_search
{
"size": 0,
"aggs": {
"job_category_count": {
"cardinality": {
"field": "job"
}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 16,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"job_category_count": {
"value": 3
}
}
}
查询不同工种分桶信息
{
"size": 0,
"aggs": {
"job_category_count": {
"terms": {
"field": "job"
}
}
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 16,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"job_category_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "java",
"doc_count": 7
},
{
"key": "dba",
"doc_count": 5
},
{
"key": "html",
"doc_count": 4
}
]
}
}
}
查询不同工种的员工的数量,并查询每个工种最大年龄的员工信息。
GET employee/_search
{
"size": 0,
"aggs": {
"job_analysis": {
"terms": {
"field": "job"
},
"aggs": {
"age_top_1": {
"top_hits": {
"size": 1,
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
}
}
}
}
}
{
"took": 75,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 16,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"job_analysis": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "java",
"doc_count": 7,
"age_top_1": {
"hits": {
"total": {
"value": 7,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "employee",
"_type": "_doc",
"_id": "6",
"_score": null,
"_source": {
"id": 6,
"name": "Douge",
"job": "java",
"age": 41,
"sal": 20000,
"gender": "female"
},
"sort": [
41
]
}
]
}
}
},
{
"key": "dba",
"doc_count": 5,
"age_top_1": {
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "employee",
"_type": "_doc",
"_id": "15",
"_score": null,
"_source": {
"id": 15,
"name": "Will",
"job": "dba",
"age": 38,
"sal": 4500,
"gender": "male"
},
"sort": [
38
]
}
]
}
}
},
{
"key": "html",
"doc_count": 4,
"age_top_1": {
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "employee",
"_type": "_doc",
"_id": "2",
"_score": null,
"_source": {
"id": 2,
"name": "Rod",
"job": "html",
"age": 31,
"sal": 18000,
"gender": "female"
},
"sort": [
31
]
}
]
}
}
}
]
}
}
}
查询工资范围在 0~5000, 5001~8000, 8001~12000, 12001~18000, 18001+ 员工的人数
GET employee/_search
{
"size": 0,
"aggs": {
"sal_range_info": {
"range": {
"field": "sal",
"ranges": [
{
"to": 5000
},
{
"from": 5001,
"to": 8000
},
{
"from": 8001,
"to": 12000
},
{
"from": 12001,
"to": 18000
},
{
"from": 18001
}
]
}
}
}
}
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 16,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"sal_range_info": {
"buckets": [
{
"key": "*-5000.0",
"to": 5000.0,
"doc_count": 3
},
{
"key": "5001.0-8000.0",
"from": 5001.0,
"to": 8000.0,
"doc_count": 2
},
{
"key": "8001.0-12000.0",
"from": 8001.0,
"to": 12000.0,
"doc_count": 0
},
{
"key": "12001.0-18000.0",
"from": 12001.0,
"to": 18000.0,
"doc_count": 3
},
{
"key": "18001.0-*",
"from": 18001.0,
"doc_count": 5
}
]
}
}
}
以每5000为一个区间,查询工资在对应范围内的员工的数量
GET employee/_search
{
"size": 0,
"aggs": {
"sal_histogram": {
"histogram": {
"field": "sal",
"interval": 5000,
"extended_bounds": {
"min": 0,
"max": 25000
}
}
}
}
}
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 16,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"sal_histogram": {
"buckets": [
{
"key": 0.0,
"doc_count": 3
},
{
"key": 5000.0,
"doc_count": 3
},
{
"key": 10000.0,
"doc_count": 2
},
{
"key": 15000.0,
"doc_count": 4
},
{
"key": 20000.0,
"doc_count": 4
},
{
"key": 25000.0,
"doc_count": 0
}
]
}
}
}
查询每个工种的数量,以及不同工种的工资统计信息
GET employee/_search
{
"size": 0,
"aggs": {
"job_and_salary_info": {
"terms": {
"field": "job"
},
"aggs": {
"sal_info": {
"stats": {
"field": "sal"
}
}
}
}
}
}
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 16,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"job_and_salary_info": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "java",
"doc_count": 7,
"sal_info": {
"count": 7,
"min": 2000.0,
"max": 23000.0,
"avg": 13428.57142857143,
"sum": 94000.0
}
},
{
"key": "dba",
"doc_count": 5,
"sal_info": {
"count": 5,
"min": 4500.0,
"max": 19000.0,
"avg": 12300.0,
"sum": 61500.0
}
},
{
"key": "html",
"doc_count": 4,
"sal_info": {
"count": 4,
"min": 3000.0,
"max": 22000.0,
"avg": 14250.0,
"sum": 57000.0
}
}
]
}
}
}
不同工种下男女员工的数量,以及男女员工的薪资信息
GET employee/_search
{
"size": 0,
"aggs": {
"job_gender_sal_info": {
"terms": {
"field": "job"
},
"aggs": {
"gender_info": {
"terms": {
"field": "gender"
},
"aggs": {
"sal_info": {
"stats": {
"field": "sal"
}
}
}
}
}
}
}
}
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 16,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"job_gender_sal_info": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "java",
"doc_count": 7,
"gender_info": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "male",
"doc_count": 4,
"sal_info": {
"count": 4,
"min": 2000.0,
"max": 23000.0,
"avg": 11250.0,
"sum": 45000.0
}
},
{
"key": "female",
"doc_count": 3,
"sal_info": {
"count": 3,
"min": 6000.0,
"max": 23000.0,
"avg": 16333.333333333334,
"sum": 49000.0
}
}
]
}
},
{
"key": "dba",
"doc_count": 5,
"gender_info": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "male",
"doc_count": 3,
"sal_info": {
"count": 3,
"min": 4500.0,
"max": 16000.0,
"avg": 9166.666666666666,
"sum": 27500.0
}
},
{
"key": "female",
"doc_count": 2,
"sal_info": {
"count": 2,
"min": 15000.0,
"max": 19000.0,
"avg": 17000.0,
"sum": 34000.0
}
}
]
}
},
{
"key": "html",
"doc_count": 4,
"gender_info": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "female",
"doc_count": 3,
"sal_info": {
"count": 3,
"min": 3000.0,
"max": 18000.0,
"avg": 11666.666666666666,
"sum": 35000.0
}
},
{
"key": "male",
"doc_count": 1,
"sal_info": {
"count": 1,
"min": 22000.0,
"max": 22000.0,
"avg": 22000.0,
"sum": 22000.0
}
}
]
}
}
]
}
}
}
查询平均工资最低的部门的平均工资,以及最低工资
GET employee/_search
{
"size": 0,
"aggs": {
"jobs": {
"terms": {
"field": "job"
},
"aggs": {
"sal_info": {
"avg": {
"field": "sal"
}
}
}
},
"min_avg_sal": {
"max_bucket": {
"buckets_path": "jobs>sal_info"
}
}
}
}
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 16,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"jobs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "java",
"doc_count": 7,
"sal_info": {
"value": 13428.57142857143
}
},
{
"key": "dba",
"doc_count": 5,
"sal_info": {
"value": 12300.0
}
},
{
"key": "html",
"doc_count": 4,
"sal_info": {
"value": 14250.0
}
}
]
},
"min_avg_sal": {
"value": 14250.0,
"keys": [
"html"
]
}
}
}
三、ES自带航空数据案例
查询到达各目的地的航班的数量
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"dest_info": {
"terms": {
"field": "DestCountry"
}
}
}
}
查询到达各航班的的数量,以及票价的最大值,平均值
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"dest_info": {
"terms": {
"field": "DestCountry"
},
"aggs": {
"max_ticket_price": {
"max": {
"field": "AvgTicketPrice"
}
},
"avg_ticket_price": {
"avg": {
"field": "AvgTicketPrice"
}
}
}
}
}
}
查询到达各航班的的数量,以及票价的聚合信息以及天气的基本信息
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"dest_info": {
"terms": {
"field": "DestCountry"
},
"aggs": {
"ticket_info": {
"stats": {
"field": "AvgTicketPrice"
}
},
"weather_info": {
"terms": {
"field": "DestWeather"
}
}
}
}
}
}
评论