Elasticsearch学习总结（五）

宇文氏 · 发表于 2017-5-21 06:19:50

ES的聚合

我们还有一个需求需要完成：允许管理者在职员目录中进行一些分析。 Elasticsearch有一个功能叫做聚合(aggregations)，它允许你在数据上生成复杂的分析统计。它很像SQL中的GROUP BY但是功能更强大。

　　举个例子，让我们找到所有职员中最大的共同点（兴趣爱好）是什么：
　　GET /megacorp/employee/_search
{
"aggs": {
    "all_interests": {
      "terms": { "field": "interests" }
    }
}
}
　　暂时先忽略语法只看查询结果：

{
"took": 229,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 1,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 1,
"_source": {
"first_name": "Lily",
"last_name": "Smith",
"age": 29,
"about": "I like to go shopping!",
"interests": [
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "3",
"_score": 1,
"_source": {
"first_name": "Tom",
"last_name": "Smith",
"age": 18,
"about": "I like to play basketball!",
"interests": [
"music"
]
}
}
]
},
"aggregations": {
"all_interests": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "music",
"doc_count": 3
}
]
}
}
}
　　我们可以看到两个职员对音乐有兴趣，一个喜欢林学，一个喜欢运动。这些数据并没有被预先计算好，它们是实时的从匹配查询语句的文档中动态计算生成的。如果我们想知道所有名为"Tom"的人最大的共同点（兴趣爱好），我们只需要增加合适的语句既可：
　　GET /megacorp/employee/_search
{
"query": {
    "match": {
      "first_name": "Tom"
    }
},
"aggs": {
    "all_interests": {
      "terms": {
        "field": "interests"
      }
    }
}
}
　　all_interests聚合已经变成只包含和查询语句相匹配的文档了：

{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.30685282,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "3",
"_score": 0.30685282,
"_source": {
"first_name": "Tom",
"last_name": "Smith",
"age": 18,
"about": "I like to play basketball!",
"interests": [
"music"
]
}
}
]
},
"aggregations": {
"all_interests": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "music",
"doc_count": 1
}
]
}
}
}
　　聚合也允许分级汇总。例如，让我们统计每种兴趣下职员的平均年龄：
　　GET /megacorp/employee/_search
{
    "aggs" : {
        "all_interests" : {
            "terms" : { "field" : "interests" },
            "aggs" : {
                "avg_age" : {
                    "avg" : { "field" : "age" }
                }
            }
        }
    }
}

"aggregations": {
"all_interests": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "music",
"doc_count": 3,
"avg_age": {
"value": 26.333333333333332,
"value_as_string": "26.333333333333332"
}
}
]
}
}
　　该聚合结果比之前的聚合结果要更加丰富。我们依然得到了兴趣以及数量（指具有该兴趣的员工人数）的列表，但是现在每个兴趣额外拥有avg_age字段来显示具有该兴趣员工的平均年龄。
　　即使你还不理解语法，但你也可以大概感觉到通过这个特性可以完成相当复杂的聚合工作，你可以处理任何类型的数据。

账号		自动登录	找回密码
密码			立即注册

VMware vcenter+vSphere 6.5 U2共享

【跟谁学】韩宇极简英语课-技术人员不得不

用Zabbix通过JMX方式监控weblogic

winhex数据恢复教程（非常巨大，内容丰富）

Symantec Backup Exec 2015 2016/2012 BE20

NetScaler VPX部署之：NetScaler Gateway调

zabbix3.4.1安装部署+微信推送信息+大屏显

[经验分享] Elasticsearch学习总结（五）

扫码加入运维网微信交流群