这是我参与8月更文挑战的第17天,活动详情查看:8月更文挑战
🌈往期回顾
感谢阅读,希望能对你有所帮助,博文若有瑕疵请在评论区留言或在主页个人介绍中添加我私聊我,感谢每一位小伙伴不吝赐教。我是XiaoLin,既会写bug也会唱rap的男孩
七、IK分词器
ElasticSearch中采用标准分词器进行分词,这种方式并不适用于中文网站。因此需要修改ElasticSearch对中文友好分词,从而达到更佳的搜索的效果。而支持中文分词的分词器就是IK分词器。
7.1、在线安装IK分词器
将ElasticSearch服务中原始数据删除(必做)
# 进入es安装目录中将data目录数据删除
rm -rf data
复制代码
安装IK分词器
# 在es安装目录中执行
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.8.0/elasticsearch-analysis-ik-6.8.0.zip
复制代码
IK分词器要求版本严格与当前使用版本一致,如需使用其他版本替换 6.2.4
为使用的版本号。
测试
GET /_analyze
{
"text": "中华人民共和国国歌",
"analyzer": "ik_smart"
}
复制代码
7.2、离线安装IK分词器
可以将对应的IK分词器下载到本地,然后再安装,官网下载地址:官网直达,下面贴一张官网给的IK分词器和ElasticSearch版本对应图。
去官网下载对应版本的IK分词器
# 可以去官网下载然后上传,或者使用wegt命令,下载其他版本的修改版本号即可。
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.2.4/elasticsearch-analysis-ik-6.2.4.zip
复制代码
解压
unzip elasticsearch-analysis-ik-6.2.4.zip
复制代码
移动到es安装目录的plugins目录中
mv elasticsearch elasticsearch-6.2.4/plugins/
复制代码
重启ElasticSearch生效
本地安装ik分词器配置目录:es安装目录中/plugins/analysis-ik/config/IKAnalyzer.cfg.xml
测试
DELETE /ems
PUT /ems
{
"mappings":{
"emp":{
"properties":{
"name":{
"type":"text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
},
"age":{
"type":"integer"
},
"bir":{
"type":"date"
},
"content":{
"type":"text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
},
"address":{
"type":"keyword"
}
}
}
}
}
PUT /ems/emp/_bulk
{"index":{}}
{"name":"小黑","age":23,"bir":"2012-12-12","content":"为开发团队选择一款优秀的MVC框架是件难事儿,在众多可行的方案中决择需要很高的经验和水平","address":"北京"}
{"index":{}}
{"name":"王小黑","age":24,"bir":"2012-12-12","content":"Spring 框架是一个分层架构,由 7 个定义良好的模块组成。Spring 模块构建在核心容器之上,核心容器定义了创建、配置和管理 bean 的方式","address":"上海"}
{"index":{}}
{"name":"张小五","age":8,"bir":"2012-12-12","content":"Spring Cloud 作为Java 语言的微服务框架,它依赖于Spring Boot,有快速开发、持续交付和容易部署等特点。Spring Cloud 的组件非常多,涉及微服务的方方面面,井在开源社区Spring 和Netflix 、Pivotal 两大公司的推动下越来越完善","address":"无锡"}
{"index":{}}
{"name":"win7","age":9,"bir":"2012-12-12","content":"Spring的目标是致力于全方位的简化Java开发。 这势必引出更多的解释, Spring是如何简化Java开发的?","address":"南京"}
{"index":{}}
{"name":"梅超风","age":43,"bir":"2012-12-12","content":"Redis是一个开源的使用ANSI C语言编写、支持网络、可基于内存亦可持久化的日志型、Key-Value数据库,并提供多种语言的API","address":"杭州"}
{"index":{}}
{"name":"张无忌","age":59,"bir":"2012-12-12","content":"ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口","address":"北京"}
GET /ems/emp/_search
{
"query":{
"term":{
"content":"框架"
}
},
"highlight": {
"pre_tags": ["<span style='color:red'>"],
"post_tags": ["</span>"],
"fields": {
"*":{}
}
}
}
复制代码
7.3、IK分词器的类型
7.3.1、ik_max_word
ik_max_word:会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合。
7.3.2、ik_smart
ik_smart:会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。
7.4、配置扩展词
IK支持自定义扩展词典
和停用词典
,所谓**扩展词典
就是有些词并不是关键词,但是也希望被ES用来作为检索的关键词,可以将这些词加入扩展词典。停用词典
**就是有些词是关键词,但是出于业务场景不想使用这些关键词被检索到,可以将这些词放入停用词典。词典的编码必须为UTF-8,否则无法生效。
定义扩展词典和停用词典可以修改IK分词器中config
目录中IKAnalyzer.cfg.xml
这个文件。
修改IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">ext_dict.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords">ext_stopword.dic</entry>
</properties>
复制代码
在ik分词器目录下config目录中创建ext_dict.dic文件
编码一定要为UTF-8才能生效。
在ext_dict.dic中加入扩展词即可
重启ElasticSearch生效
八、Filter Query过滤查询
8.1、过滤查询
其实准确来说,ElasticSearch中的查询操作分为2种:
- 查询(query):查询即是之前提到的query查询,它 (查询)默认会计算每个返回文档的得分,然后根据得分排序。
- 过滤(filter):过滤(filter)只会筛选出符合的文档,并不计算得分,且它可以缓存文档 。
所以,单从性能考虑,过滤比查询更快。
过滤适合在大范围筛选数据,而查询则适合精确匹配数据。一般应用时, 应先使用过滤操作过滤数据, 然后使用查询匹配数据。
8.2、过滤语法
GET /ems/emp/_search
{
"query": {
"bool": {
"must": [
{"match_all": {}}
],
"filter": {
"range": {
"age": {
"gte": 10
}
}
}
}
}
}
复制代码
在执行filter和query时,先执行filter再执行query。Elasticsearch会自动缓存经常使用的过滤器,以加快性能。
8.3、过滤器类型
8.3.1、term
GET /ems/emp/_search # 使用term过滤
{
"query": {
"bool": {
"must": [
{"term": {
"name": {
"value": "小黑"
}
}}
],
"filter": {
"term": {
"content":"框架"
}
}
}
}
}
复制代码
8.3.2、terms
GET /dangdang/book/_search #使用terms过滤
{
"query": {
"bool": {
"must": [
{"term": {
"name": {
"value": "中国"
}
}}
],
"filter": {
"terms": {
"content":[
"科技",
"声音"
]
}
}
}
}
}
复制代码
8.3.3、range
GET /ems/emp/_search
{
"query": {
"bool": {
"must": [
{"term": {
"name": {
"value": "中国"
}
}}
],
"filter": {
"range": {
"age": {
"gte": 7,
"lte": 20
}
}
}
}
}
}
复制代码
8.3.4、exists
过滤存在指定字段,获取字段不为空的索引记录使用
GET /ems/emp/_search
{
"query": {
"bool": {
"must": [
{"term": {
"name": {
"value": "中国"
}
}}
],
"filter": {
"exists": {
"field":"aaa"
}
}
}
}
}
复制代码
8.3.5、ids
过滤含有指定字段的索引记录。
GET /ems/emp/_search
{
"query": {
"bool": {
"must": [
{"term": {
"name": {
"value": "中国"
}
}}
],
"filter": {
"ids": {
"values": ["1","2","3"]
}
}
}
}
}
复制代码
九、Java操作ElasticSearch
ElasticSearch不能够取代数据库,ElasticSearch最核心最强大的功能时完成检索,可以把给用户查询到的数据放ElasticSearch中
9.1、引入依赖
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>6.8.0</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>6.8.0</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.plugin</groupId>
<artifactId>transport-netty4-client</artifactId>
<version>6.8.0</version>
</dependency>
复制代码
9.2、创建索引和类型
PUT /dangdang
{
"mappings": {
"book":{
"properties": {
"name":{
"type":"text",
"analyzer": "ik_max_word"
},
"age":{
"type":"integer"
},
"sex":{
"type":"keyword"
},
"content":{
"type":"text",
"analyzer": "ik_max_word"
}
}
}
}
}
复制代码
9.3、Java操作ElasticSearch
9.3.1、创建客户端对象
//创建ES客户端操作对象
@Test
public void init() throws UnknownHostException {
PreBuiltTransportClient preBuiltTransportClient = new PreBuiltTransportClient(Settings.EMPTY);
preBuiltTransportClient.addTransportAddress(new TransportAddress(
InetAddress.getByName("192.168.202.200"),9300));
}
复制代码
9.3.2、创建索引
//创建索引
@Test
public void createIndex() throws UnknownHostException, ExecutionException, InterruptedException {
PreBuiltTransportClient preBuiltTransportClient = new PreBuiltTransportClient(Settings.EMPTY);
preBuiltTransportClient.addTransportAddress(new TransportAddress(
InetAddress.getByName("192.168.202.200"),9300));
//定义索引请求
CreateIndexRequest ems = new CreateIndexRequest("ems");
//执行索引创建
CreateIndexResponse createIndexResponse = preBuiltTransportClient.admin().indices().create(ems).get();
System.out.println(createIndexResponse.isAcknowledged());
}
复制代码
9.3.3、删除索引
//删除索引
@Test
public void deleteIndex() throws UnknownHostException, ExecutionException, InterruptedException {
PreBuiltTransportClient preBuiltTransportClient = new PreBuiltTransportClient(Settings.EMPTY);
preBuiltTransportClient.addTransportAddress(new TransportAddress(
InetAddress.getByName("192.168.202.200"),9300));
//定义索引请求
DeleteIndexRequest ems = new DeleteIndexRequest("ems");
//执行索引删除
AcknowledgedResponse acknowledgedResponse = preBuiltTransportClient.admin().indices().delete(ems).get();
System.out.println(acknowledgedResponse.isAcknowledged());
}
复制代码
9.3.4、创建索引和类型
//创建索引类型和映射
@Test
public void init() throws UnknownHostException, ExecutionException, InterruptedException {
PreBuiltTransportClient preBuiltTransportClient = new PreBuiltTransportClient(Settings.EMPTY);
preBuiltTransportClient.addTransportAddress(new TransportAddress(
InetAddress.getByName("192.168.202.200"),9300));
//创建索引
CreateIndexRequest ems = new CreateIndexRequest("ems");
//定义json格式映射
String json = "{\"properties\":{\"name\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\"},\"age\":{\"type\":\"integer\"},\"sex\":{\"type\":\"keyword\"},\"content\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\"}}}";
//设置类型和mapping
ems.mapping("emp",json, XContentType.JSON);
//执行创建
CreateIndexResponse createIndexResponse = preBuiltTransportClient.admin().indices().create(ems).get();
System.out.println(createIndexResponse.isAcknowledged());
}
复制代码
9.3.5、索引一条记录
9.3.5.1、根据id索引
//索引一条文档 指定id
@Test
public void createIndexOptionId() throws JsonProcessingException {
Emp emp = new Emp("xiaolin", 18, "男", "我是xiaolin,来自中国");
String s = JSONObject.toJSONString(emp);
IndexResponse indexResponse = transportClient.prepareIndex("ems", "emp", "1").setSource(s, XContentType.JSON).get();
System.out.println(indexResponse.status());
}
复制代码
9.3.5.2、自动生成id索引记录
//索引一条文档 指定id
@Test
public void createIndexOptionId() throws JsonProcessingException {
Emp emp = new Emp("XiaoLin", 18 , "男", "我是XiaoLin");
String s = JSONObject.toJSONString(emp);
IndexResponse indexResponse = transportClient.prepareIndex("ems", "emp")
.setSource(s, XContentType.JSON).get();
System.out.println(indexResponse.status());
}
复制代码
9.3.6、更新索引
//更新一条记录
@Test
public void testUpdate() throws IOException {
Emp emp = new Emp();
emp.setName("明天你好");
String s = JSONObject.toJSONString(emp);
UpdateResponse updateResponse = transportClient.prepareUpdate("ems", "emp", "1")
.setDoc(s,XContentType.JSON).get();
System.out.println(updateResponse.status());
}
复制代码
9.3.6、批量更新
//批量更新
@Test
public void testBulk() throws IOException {
//添加第一条记录
IndexRequest request1 = new IndexRequest("ems","emp","1");
Emp emp = new Emp("中国科技", 23, "男", "这是好人");
request1.source(JSONObject.toJSONString(emp),XContentType.JSON);
//添加第二条记录
IndexRequest request2 = new IndexRequest("ems","emp","2");
Emp emp2 = new Emp("中国科技", 23, "男", "这是好人");
request2.source(JSONObject.toJSONString(emp2),XContentType.JSON);
//更新记录
UpdateRequest updateRequest = new UpdateRequest("ems","emp","1");
Emp empUpdate = new Emp();
empUpdate.setName("中国力量");
updateRequest.doc(JSONObject.toJSONString(empUpdate),XContentType.JSON);
//删除一条记录
DeleteRequest deleteRequest = new DeleteRequest("ems","emp","2");
BulkResponse bulkItemResponses = transportClient.prepareBulk()
.add(request1)
.add(request2)
.add(updateRequest)
.add(deleteRequest)
.get();
BulkItemResponse[] items = bulkItemResponses.getItems();
for (BulkItemResponse item : items) {
System.out.println(item.status());
}
}
复制代码
9.3.7、检索文档
9.3.7.1、查询所有并排序
/**
* 查询所有并排序
* ASC 升序 DESC 降序
* addSort("age", SortOrder.ASC) 指定排序字段以及使用哪种方式排序
* addSort("age", SortOrder.DESC) 指定排序字段以及使用哪种方式排序
*/
@Test
public void testMatchAllQuery() throws UnknownHostException {
TransportClient transportClient = new PreBuiltTransportClient(Settings.EMPTY).addTransportAddress(new TransportAddress(InetAddress.getByName("172.16.251.142"), 9300));
SearchResponse searchResponse = transportClient.prepareSearch("dangdang").setTypes("book").setQuery(QueryBuilders.matchAllQuery()).addSort("age", SortOrder.DESC).get();
SearchHits hits = searchResponse.getHits();
System.out.println("符合条件的记录数: "+hits.totalHits);
for (SearchHit hit : hits) {
System.out.print("当前索引的分数: "+hit.getScore());
System.out.print(", 对应结果:=====>"+hit.getSourceAsString());
System.out.println(", 指定字段结果:"+hit.getSourceAsMap().get("name"));
System.out.println("=================================================");
}
}
复制代码
9.3.7.2、分页查询
/**
* 分页查询
* From 从那条记录开始 默认从0 开始 form = (pageNow-1)*size
* Size 每次返回多少条符合条件的结果 默认10
*/
@Test
public void testMatchAllQueryFormAndSize() throws UnknownHostException {
TransportClient transportClient = new PreBuiltTransportClient(Settings.EMPTY).addTransportAddress(new TransportAddress(InetAddress.getByName("172.16.251.142"), 9300));
SearchResponse searchResponse = transportClient.prepareSearch("dangdang").setTypes("book").setQuery(QueryBuilders.matchAllQuery()).setFrom(0).setSize(2).get();
SearchHits hits = searchResponse.getHits();
System.out.println("符合条件的记录数: "+hits.totalHits);
for (SearchHit hit : hits) {
System.out.print("当前索引的分数: "+hit.getScore());
System.out.print(", 对应结果:=====>"+hit.getSourceAsString());
System.out.println(", 指定字段结果:"+hit.getSourceAsMap().get("name"));
System.out.println("=================================================");
}
}
复制代码
9.3.7.3、返回查询字段
/**
* 查询返回指定字段(source) 默认返回所有
* setFetchSource 参数1:包含哪些字段 参数2:排除哪些字段
* setFetchSource("*","age") 返回所有字段中排除age字段
* setFetchSource("name","") 只返回name字段
* setFetchSource(new String[]{},new String[]{})
*/
@Test
public void testMatchAllQuerySource() throws UnknownHostException {
TransportClient transportClient = new PreBuiltTransportClient(Settings.EMPTY).addTransportAddress(new TransportAddress(InetAddress.getByName("172.16.251.142"), 9300));
SearchResponse searchResponse = transportClient.prepareSearch("dangdang").setTypes("book").setQuery(QueryBuilders.matchAllQuery()).setFetchSource("*","age").get();
SearchHits hits = searchResponse.getHits();
System.out.println("符合条件的记录数: "+hits.totalHits);
for (SearchHit hit : hits) {
System.out.print("当前索引的分数: "+hit.getScore());
System.out.print(", 对应结果:=====>"+hit.getSourceAsString());
System.out.println(", 指定字段结果:"+hit.getSourceAsMap().get("name"));
System.out.println("=================================================");
}
}
复制代码
9.3.7.8、term查询
/**
* term查询
*/
@Test
public void testTerm() throws UnknownHostException {
TransportClient transportClient = new PreBuiltTransportClient(Settings.EMPTY).addTransportAddress(new TransportAddress(InetAddress.getByName("172.16.251.142"), 9300));
TermQueryBuilder queryBuilder = QueryBuilders.termQuery("name","中国");
SearchResponse searchResponse = transportClient.prepareSearch("dangdang").setTypes("book").setQuery(queryBuilder).get();
}
复制代码
9.3.7.9、range查询
/**
* rang查询
* lt 小于
* lte 小于等于
* gt 大于
* gte 大于等于
*/
@Test
public void testRange() throws UnknownHostException {
TransportClient transportClient = new PreBuiltTransportClient(Settings.EMPTY).addTransportAddress(new TransportAddress(InetAddress.getByName("172.16.251.142"), 9300));
RangeQueryBuilder rangeQueryBuilder = QueryBuilders.rangeQuery("age").lt(45).gte(8);
SearchResponse searchResponse = transportClient.prepareSearch("dangdang").setTypes("book").setQuery(rangeQueryBuilder).get();
}
复制代码
9.3.7.10、prefix查询
/**
* prefix 前缀查询
*
*/
@Test
public void testPrefix() throws UnknownHostException {
TransportClient transportClient = new PreBuiltTransportClient(Settings.EMPTY).addTransportAddress(new TransportAddress(InetAddress.getByName("172.16.251.142"), 9300));
PrefixQueryBuilder prefixQueryBuilder = QueryBuilders.prefixQuery("name", "中");
SearchResponse searchResponse = transportClient.prepareSearch("dangdang").setTypes("book").setQuery(prefixQueryBuilder).get();
}
复制代码
9.3.7.11、wildcard查询
/**
* wildcardQuery 通配符查询
*
*/
@Test
public void testwildcardQuery() throws UnknownHostException {
TransportClient transportClient = new PreBuiltTransportClient(Settings.EMPTY).addTransportAddress(new TransportAddress(InetAddress.getByName("172.16.251.142"), 9300));
WildcardQueryBuilder wildcardQueryBuilder = QueryBuilders.wildcardQuery("name", "中*");
SearchResponse searchResponse = transportClient.prepareSearch("dangdang").setTypes("book").setQuery(wildcardQueryBuilder).get();
}
复制代码
9.3.7.12、ids查询
/**
* ids 查询
*/
@Test
public void testIds() throws UnknownHostException {
TransportClient transportClient = new PreBuiltTransportClient(Settings.EMPTY).addTransportAddress(new TransportAddress(InetAddress.getByName("172.16.251.142"), 9300));
IdsQueryBuilder idsQueryBuilder = QueryBuilders.idsQuery().addIds("1","2");
SearchResponse searchResponse = transportClient.prepareSearch("dangdang").setTypes("book").setQuery(idsQueryBuilder).get();
}
复制代码
9.3.7.13、fuzzy查询
/**
* fuzzy 模糊查询
*/
@Test
public void testFuzzy() throws UnknownHostException {
TransportClient transportClient = new PreBuiltTransportClient(Settings.EMPTY).addTransportAddress(new TransportAddress(InetAddress.getByName("172.16.251.142"), 9300));
FuzzyQueryBuilder fuzzyQueryBuilder = QueryBuilders.fuzzyQuery("content", "国人");
SearchResponse searchResponse = transportClient.prepareSearch("dangdang").setTypes("book").setQuery(fuzzyQueryBuilder).get();
}
复制代码
9.3.7.14、bool查询
/**
* bool 布尔查询
*/
@Test
public void testBool() throws UnknownHostException {
TransportClient transportClient = new PreBuiltTransportClient(Settings.EMPTY).addTransportAddress(new TransportAddress(InetAddress.getByName("172.16.251.142"), 9300));
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.should(QueryBuilders.matchAllQuery());
boolQueryBuilder.mustNot(QueryBuilders.rangeQuery("age").lte(8));
boolQueryBuilder.must(QueryBuilders.termQuery("name","中国"));
SearchResponse searchResponse = transportClient.prepareSearch("dangdang").setTypes("book").setQuery(boolQueryBuilder).get();
}
复制代码
9.3.7.15、高亮查询
/**
* 高亮查询
* .highlighter(highlightBuilder) 用来指定高亮设置
* requireFieldMatch(false) 开启多个字段高亮
* field 用来定义高亮字段
* preTags("<span style='color:red'>") 用来指定高亮前缀
* postTags("</span>") 用来指定高亮后缀
*/
@Test
public void testHighlight() throws UnknownHostException {
TransportClient transportClient = new PreBuiltTransportClient(Settings.EMPTY).addTransportAddress(new TransportAddress(InetAddress.getByName("172.16.251.142"), 9300));
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "中国");
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.requireFieldMatch(false).field("name").field("content").preTags("<span style='color:red'>").postTags("</span>");
SearchResponse searchResponse = transportClient.prepareSearch("dangdang").setTypes("book").highlighter(highlightBuilder).highlighter(highlightBuilder).setQuery(termQueryBuilder).get();
SearchHits hits = searchResponse.getHits();
System.out.println("符合条件的记录数: "+hits.totalHits);
for (SearchHit hit : hits) {
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
Map<String, HighlightField> highlightFields = hit.getHighlightFields();
System.out.println("================高亮之前==========");
for(Map.Entry<String,Object> entry:sourceAsMap.entrySet()){
System.out.println("key: "+entry.getKey() +" value: "+entry.getValue());
}
System.out.println("================高亮之后==========");
for (Map.Entry<String,Object> entry:sourceAsMap.entrySet()){
HighlightField highlightField = highlightFields.get(entry.getKey());
if (highlightField!=null){
System.out.println("key: "+entry.getKey() +" value: "+ highlightField.fragments()[0]);
}else{
System.out.println("key: "+entry.getKey() +" value: "+entry.getValue());
}
}
}
}
复制代码
近期评论