一个奇怪的 Elasticsearch 节点
在 Windows 10 台式机上启动了三个 Elasticsearch 的节点,版本是 7.10.1,将配置文件分别改为如下内容:
# master node
cluster.name: es-study-cluster
node.name: master
network.host: 127.0.0.1
http.port: 9200
# slave-1
cluster.name: es-study-cluster
node.name: slave-1
network.host: 127.0.0.1
http.port: 9201
# slave-2
cluster.name: es-study-cluster
node.name: slave-2
network.host: 127.0.0.1
http.port: 9202
按照在网络上看到的教程,三个节点应该自动加入一个集群,形成一主两备的局面,结果……
2号“奴隶”节点自立山头了。
在浏览器分别查看 Elasticsearch 的状态如下:
# master
{
"name" : "master",
"cluster_name" : "es-study-cluster",
"cluster_uuid" : "ras0xD8ARh-WGaJ4VwUY1Q",
"version" : {
"number" : "7.10.1",
"build_flavor" : "default",
"build_type" : "zip",
"build_hash" : "1c34507e66d7db1211f66f3513706fdf548736aa",
"build_date" : "2020-12-05T01:00:33.671820Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
# slave-1
{
"name" : "slave-1",
"cluster_name" : "es-study-cluster",
"cluster_uuid" : "ras0xD8ARh-WGaJ4VwUY1Q",
"version" : {
"number" : "7.10.1",
"build_flavor" : "default",
"build_type" : "zip",
"build_hash" : "1c34507e66d7db1211f66f3513706fdf548736aa",
"build_date" : "2020-12-05T01:00:33.671820Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
# slave-2
{
"name" : "slave-2",
"cluster_name" : "es-study-cluster",
"cluster_uuid" : "PtZ0JU5bQBG_OBXh3tMyVQ",
"version" : {
"number" : "7.10.1",
"build_flavor" : "default",
"build_type" : "zip",
"build_hash" : "1c34507e66d7db1211f66f3513706fdf548736aa",
"build_date" : "2020-12-05T01:00:33.671820Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
可以注意到 slave-2 的集群名称 cluster_name 没有问题,但是 cluster_uuid 与 master 和 slave-1 不一样。
作为一个 Elasticsearch 的新人,我有点手足无措。
为了证明是 slave-2 的个人情绪问题,所以我又启动了一个 slave-3,然后发现 slave-3 很顺利的就融入了集体。
{
"name" : "slave-3",
"cluster_name" : "es-study-cluster",
"cluster_uuid" : "ras0xD8ARh-WGaJ4VwUY1Q",
"version" : {
"number" : "7.10.1",
"build_flavor" : "default",
"build_type" : "zip",
"build_hash" : "1c34507e66d7db1211f66f3513706fdf548736aa",
"build_date" : "2020-12-05T01:00:33.671820Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
然后去看了一下日志,发现了一些警告信息
[2021-01-12T12:34:05,308][WARN ][o.e.g.DanglingIndicesState] [MS-FZYRJUKHIOOR] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually
...
[2021-01-12T12:34:07,103][WARN ][o.e.d.HandshakingTransportAddressConnector] [MS-FZYRJUKHIOOR] handshake failed for [connectToRemoteMasterNode[127.0.0.1:9301]]
java.lang.IllegalStateException: handshake with [{127.0.0.1:9301}{XLkiFTcWRGabfDE7s5nzrg}{127.0.0.1}{127.0.0.1:9301}] failed: remote cluster name [es-study-cluster] does not match local cluster name [elasticsearch]
...
[2021-01-13T17:21:30,921][WARN ][o.e.h.AbstractHttpServerTransport] [slave-1] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/127.0.0.1:9201, remoteAddress=/127.0.0.1:49243}
java.io.IOException: 你的主机中的软件中止了一个已建立的连接。
从日志里面看,似乎是第二个节点多次试图加入 cluster,但是不知道什么原因被拒绝了。然后放狗去查,在万能的 stackoverflow 上找到了一个,在 slave-2 的配置文件里面增加了一行
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300", "127.0.0.1:9301"]
然后……
slave-2 就高高兴兴的加入小团伙了。
另外一个让我不理解的事情,就是我的启动顺序是 master,slave-1,slave-2,可是从 elasticsearch-head 里面来看,似乎 slave-1 抢班夺权,成了主节点,不知道是为什么。

作为小白,还有一个问题,就是为什么题图上的 shard 上面有编号,而我的都是 0。
是说我从零开始学 Elasticsearch 么?