Elasticsearch 批量查询 mget

2022-03-10 作者: escray

文字内容整理自 B 站中华石杉的 Elasticsearch 顶尖高手系列课程核心知识篇

批量查询的优点

如果一条一条的查询，比如说要查询100条数据，那么就要发送 100 次网络请求，这个开销还是很大的。

如果批量查询的话，查询100条数据，就只要发送 1 次网络请求，网络请求的性能开销缩减 100 倍

mget 的语法

一条一条的查询

GET test_index/_doc/2
GET test_index/_doc/2

mget批量查询

GET _mget
{
  "docs":[
    {
      "_index": "test_index",
      "_type": "_doc",
      "_id": 1
    },
    {
      "_index": "test_index",
      "_type": "_doc",
      "_id": 2
    }
  ]
}

#! Deprecation: [types removal] Specifying types in multi get requests is deprecated.
{
  "docs" : [
    {
      "_index" : "test_index",
      "_type" : "_doc",
      "_id" : "1",
      "_version" : 1,
      "_seq_no" : 36,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "test_field1" : "test1",
        "test_field2" : "test2"
      }
    },
...

GET _mget
{
  "docs":[
    {
      "_index": "test_index",
      "_id": 1
    },
    {
      "_index": "test_index",
      "_id": 2
    }
  ]
}

如果查询的 document 是一个 index 下的不同 type 种的话

GET test_index/_mget
{
  "docs": [
    {
      "_type": "_doc",
      "_id": 1
    },
    {
      "_type": "_doc",
      "_id": 2
    }
  ]
}

#! Deprecation: [types removal] Specifying types in multi get requests is deprecated.

GET test_index/_mget
{
  "docs": [
    {
      "_id": 1
    },
    {
      "_id": 2
    }
  ]
}

如果查询的数据都在同一个index下的同一个type下，最简单了

GET test_index/_doc/_mget
{
  "ids": [1, 2]
}

#! Deprecation: [types removal] Specifying types in multi get requests is deprecated.

GET test_index/_mget
{
  "ids": [1, 2]
}

以下内容来自

Multi get (mget) API，Retrieves multiple JSON documents by ID

过滤字段 Filter source fields

GET test_index/_mget
{
  "docs": [
    {
      "_id": 1,
      "_source": false
      
    },
    {
      "_id": 2,
      "_source": ["field2"]
    },
    {
      "_id": 3,
      "_source": {
        "include": ["user"],
        "exclude": ["user.location"]
      }
    }
  ]
}

stored field

Use the stored_fields attribute to specify the set of stored fields you want to retrieve. Any requested fields that are not stored are ignored.

GET test_index/_mget
{
  "docs": [
    {
      "_id": 1,
      "stored_fields": ["field1", "field2"]
    },
    {
      "_id": 3,
      "stored_fields": ["user.name", "user.age"]
    }
  ]
}

{
  "docs" : [
    {
      "_index" : "test_index",
      "_type" : "test_type",
      "_id" : "1",
      "_version" : 1,
      "_seq_no" : 36,
      "_primary_term" : 1,
      "found" : true
    },
    {
      "_index" : "test_index",
      "_type" : "test_type",
      "_id" : "3",
      "_version" : 1,
      "_seq_no" : 38,
      "_primary_term" : 1,
      "found" : true
    }
  ]
}

执行成功了，但是我不知道如何读取 stored_fields 的内容

还有可以指定默认的 stored_fields。

GET test_index/_mget?stored_fields=field1,field2
{
  "docs": [
    {
      "_id": 1
    },
    {
      "_id": 3,
      "stored_fields": ["user.name", "user.age"]
    }
    ]
}

指定文档路由 Specify document routing

虽然我现在还没有学到路由 routing

GET test_index/_mget?routing=key1
{
  "docs": [
    {
      "_id": 1,
      "routing": "key2"
    },
    {
      "_id": 2
    }
    ]
}

这里还有默认路由。

比较诡异的是，在我对路由完全没有概念，也没有建立任何路由的时候，上面的语句执行正确，这个算是另一种用户友好么?

mget 的重要性

一般来说，在进行查询的时候，如果一次性要查询多条数据的话，那么一定要用 batch 批量操作的 api，尽可能减少网络开销次数，可能可以将性能提升数倍，甚至数十倍，非常非常之重要