Elasticsearch性能优化

👈🏻 Select language

使用routing明确数据对应的分片位置

Elasticsearch的路由（Routing）特性

性能减低的原因

Your clients are simply sending too many queries too quickly in a fast burst, overwhelming the queue. You can monitor this with Node Stats over time to see if it’s bursty or smooth
You’ve got some very slow queries which get “stuck” for a long time, eating up threads and causing the queue to back up. You can enable the slow log to see if there are queries that are taking an exceptionally long time, then try to tune those
There may potentially be “unending” scripts written in Groovy or something. E.g. a loop that never exits, causing the thread to spin forever.
Your hardware may be under-provisioned for your workload, and bottlenecking on some resource (disk, cpu, etc)
A temporary hiccup from your iSCSI target, which causes all the in-flight operations to block waiting for the disks to come back. It wouldn’t take a big latency hiccup to seriously backup a busy cluster… ES generally expects disks to always be available.
Heavy garbage collections could cause problems too. Check Node Stats to see if there are many/long old gen GCs running

这段E文，无非就是说，你的机器太low，你查的过多, Mysql 的经验同样适用于ES。

节点的选择

自己玩就别整那么多节点了,Elasticsearch是内存杀手.

资源紧张的话，coordinating , data 和 ingest 可以合起来。

coordinating

协调节点是请求的入口

node.master:false
node.data:false
node.ingest:false

master

选主,决定分片位置

node.master:true
node.data:false
node.ingest:false

data

存放分片的节点

node.master:false
node.data:true
node.ingest:false

ingest

ingest节点负责处理pipeline

node.master:false
node.data:false
node.ingest:true

系统配置优化

thread_pool:
    bulk:
        queue_size: 2000
    search:
        queue_size: 2000
indices:
  query:
    bool:
      max_clause_count: 50000
  recovery:
    max_bytes_per_sec:

queue_size 是并发查询的限制,默认是1000,不同的版本名称可能略有区别,线程池的参数可以直接附在启动参数里面(毕竟挂载配置文件对我来说也是一种麻烦)

参考:

Index配置优化

shard

建议在小规格节点下单shard大小不要超过30GB。更高规格的节点单shard大小不要超过50GB。

对于日志分析场景或者超大索引，建议单shard大小不要超过100GB。

shard的个数（包括副本）要尽可能匹配节点数，等于节点数，或者是节点数的整数倍。

通常我们建议单节点上同一索引的shard个数不要超5个。

查询优化

对于 _all 这项参数，如果在业务使用上没有必要，我们通常的建议是禁止或者有选择性的添加。

只选取必须的字段

就像在关系型数据库里面,不要select * 一样.

GET /product/goods/109524071?filter_path=_source.zdid
{
  "_source" : {
    "zdid" : 48
  }
}

类似的用法还有_source,但是与filter_path不同的在于,返回的结果会带上文档本身的默认字段

GET /product/goods/109524071?_source_include=zdid
{
  "_index" : "product",
  "_type" : "goods",
  "_id" : "109524071",
  "_version" : 4,
  "found" : true,
  "_source" : {
    "zdid" : 48
  }
}

_source=false
_source_include=zdid
_source_exclude

注意:_source和filter_path不能一起用

新建索引时关闭索引映射的自动映射功能

index别名

其他经验

按照实际经验,elasticsearch多半是index的时候少,search的时候多,所以针对search去做优化比较合适.

日志的最佳实践

如果日志丢了也无所谓,建议用1节点0副本分片储存日志.

日志 index 用 xx-<date> ,这样删除的时候直接删 index 就行

delete by query 的我表示每次都想死…

POST /tracing/_delete_by_query?conflicts=proceed
{
	"query": {
		"range": {
			"@timestamp": {
				"lt": "now-90d",
				"format": "epoch_millis"
			}
		}
	}
}

GET /_tasks?&actions=*delete*

故障维护

Unassigned Shards

解决方案:新建一个number_of_replicas为0的新index,然后用_reindex.迁移完成之后,把number_of_replicas改回去.reindex有个size的参数,按需配置或许更快些.

注意可以通过GET _tasks?actions=indices:data/write/reindex?detailed查看相关任务

参考

reindex

reindex也是有技巧的。

# 禁用副本
put geonames/_settings
{
 
    "settings" : {
      "index" : {
        "number_of_replicas" : "0"
    }
  
}
}
# 禁用刷新期间，_count结果不更新
json='{"index":{"refresh_interval":"-1"}}'
curl -XPUT 0.0.0.0:9200/geonames/_settings -H 'Content-Type: application/json' -d $json

# 中途想取消也行
curl -XPOST 0.0.0.0:9200/_tasks/mHCg6HqYTqqd12nIDFDk1w:2977/_cancel

# 恢复刷新机制
json='{"index":{"refresh_interval":null}}'
curl -XPUT 0.0.0.0:9200/geonames/_settings -H 'Content-Type: application/json' -d $json

gc overhead

[2019-01-04T08:41:09,538][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-onekey-3] [gc][159] overhead, spent [276ms] collecting in the last [1s]

解决方案/问题根源:集群负荷过重,宕机了

index长时间yellow

解决方案/问题根源:先把number_of_replicas调成0,再调回去,手动触发同步.

put geonames/_settings
{
 
    "settings" : {
      "index" : {
        "number_of_replicas" : "0"
    }
  
}
}

滚动重启

_rolling_restarts

慢日志分析

慢日志分搜索和索引两种,并且可以从index,或者cluster级别进行设置

PUT _settings
{
        "index.indexing.slowlog.threshold.index.debug" : "10ms",
        "index.indexing.slowlog.threshold.index.info" : "50ms",
        "index.indexing.slowlog.threshold.index.warn" : "100ms",
        "index.search.slowlog.threshold.fetch.debug" : "100ms",
        "index.search.slowlog.threshold.fetch.info" : "200ms",
        "index.search.slowlog.threshold.fetch.warn" : "500ms",
        "index.search.slowlog.threshold.query.debug" : "100ms",
        "index.search.slowlog.threshold.query.info" : "200ms",
        "index.search.slowlog.threshold.query.warn" : "1s"
}

参考链接:

No alive nodes found in your cluster

这个要具体分析，看看ES的日志。有可能是并发连接数1000限制导致的问题。

参考工具

elasticHQ

参考链接:

Use routing to specify the shard location for data

Elasticsearch Routing Feature

Reasons for Performance Degradation

Your clients are simply sending too many queries too quickly in a fast burst, overwhelming the queue. You can monitor this with Node Stats over time to see if it’s bursty or smooth
You’ve got some very slow queries which get “stuck” for a long time, eating up threads and causing the queue to back up. You can enable the slow log to see if there are queries that are taking an exceptionally long time, then try to tune those
There may potentially be “unending” scripts written in Groovy or something. E.g. a loop that never exits, causing the thread to spin forever.
Your hardware may be under-provisioned for your workload, and bottlenecking on some resource (disk, cpu, etc)
A temporary hiccup from your iSCSI target, which causes all the in-flight operations to block waiting for the disks to come back. It wouldn’t take a big latency hiccup to seriously backup a busy cluster… ES generally expects disks to always be available.
Heavy garbage collections could cause problems too. Check Node Stats to see if there are many/long old gen GCs running

This English text basically says: your machine is too low-end, you’re querying too much. MySQL experience also applies to ES.

Node Selection

If you’re just playing around, don’t create so many nodes. Elasticsearch is a memory killer.

If resources are tight, coordinating, data, and ingest can be combined.

coordinating

Coordinating nodes are the entry point for requests

node.master:false
node.data:false
node.ingest:false

master

Elect master, decide shard locations

node.master:true
node.data:false
node.ingest:false

data

Nodes that store shards

node.master:false
node.data:true
node.ingest:false

ingest

Ingest nodes are responsible for processing pipelines

node.master:false
node.data:false
node.ingest:true

System Configuration Optimization

thread_pool:
    bulk:
        queue_size: 2000
    search:
        queue_size: 2000
indices:
  query:
    bool:
      max_clause_count: 50000
  recovery:
    max_bytes_per_sec:

queue_size is the limit for concurrent queries, default is 1000. Different versions may have slightly different names. Thread pool parameters can be directly attached to startup parameters (after all, mounting config files is also a hassle for me).

Reference:

Index Configuration Optimization

shard

It’s recommended that single shard size should not exceed 30GB on small nodes. On higher-spec nodes, single shard size should not exceed 50GB.

For log analysis scenarios or very large indices, it’s recommended that single shard size should not exceed 100GB.

The number of shards (including replicas) should match the number of nodes as much as possible, equal to the number of nodes, or an integer multiple of the number of nodes.

Usually we recommend that the number of shards for the same index on a single node should not exceed 5.

Query Optimization

For the _all parameter, if it’s not necessary in business use, our usual recommendation is to disable it or add it selectively.

Only Select Required Fields

Just like in relational databases, don’t use select *.

GET /product/goods/109524071?filter_path=_source.zdid
{
  "_source" : {
    "zdid" : 48
  }
}

Similar usage includes _source, but unlike filter_path, the returned results will include the default fields of the document itself.

GET /product/goods/109524071?_source_include=zdid
{
  "_index" : "product",
  "_type" : "goods",
  "_id" : "109524071",
  "_version" : 4,
  "found" : true,
  "_source" : {
    "zdid" : 48
  }
}

_source=false
_source_include=zdid
_source_exclude

Note: _source and filter_path cannot be used together

Disable Automatic Mapping When Creating New Indices

index aliases

Other Experience

Based on actual experience, elasticsearch is mostly used for fewer indexes and more searches, so optimizing for search is more appropriate.

Best Practices for Logs

If losing logs doesn’t matter, it’s recommended to use 1 node with 0 replica shards to store logs.

Use xx-<date> for log indices, so when deleting, you can directly delete the index.

I feel like dying every time I use delete by query…

POST /tracing/_delete_by_query?conflicts=proceed
{
	"query": {
		"range": {
			"@timestamp": {
				"lt": "now-90d",
				"format": "epoch_millis"
			}
		}
	}
}

GET /_tasks?&actions=*delete*

Troubleshooting and Maintenance

Unassigned Shards

Solution: Create a new index with number_of_replicas set to 0, then use _reindex. After migration is complete, change number_of_replicas back. reindex has a size parameter, configuring it as needed might be faster.

Note You can view related tasks through GET _tasks?actions=indices:data/write/reindex?detailed

Reference

reindex

reindex also has techniques.

# Disable replicas
put geonames/_settings
{
 
    "settings" : {
      "index" : {
        "number_of_replicas" : "0"
    }
  
}
}
# During refresh disable, _count results don't update
json='{"index":{"refresh_interval":"-1"}}'
curl -XPUT 0.0.0.0:9200/geonames/_settings -H 'Content-Type: application/json' -d $json

# You can also cancel midway
curl -XPOST 0.0.0.0:9200/_tasks/mHCg6HqYTqqd12nIDFDk1w:2977/_cancel

# Restore refresh mechanism
json='{"index":{"refresh_interval":null}}'
curl -XPUT 0.0.0.0:9200/geonames/_settings -H 'Content-Type: application/json' -d $json

gc overhead

[2019-01-04T08:41:09,538][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-onekey-3] [gc][159] overhead, spent [276ms] collecting in the last [1s]

Solution/Problem Root: Cluster overloaded, crashed

Index yellow for a long time

Solution/Problem Root: First set number_of_replicas to 0, then set it back, manually trigger synchronization.

put geonames/_settings
{
 
    "settings" : {
      "index" : {
        "number_of_replicas" : "0"
    }
  
}
}

Rolling restart

_rolling_restarts

Slow log analysis

Slow logs are divided into search and index types, and can be set at the index or cluster level

PUT _settings
{
        "index.indexing.slowlog.threshold.index.debug" : "10ms",
        "index.indexing.slowlog.threshold.index.info" : "50ms",
        "index.indexing.slowlog.threshold.index.warn" : "100ms",
        "index.search.slowlog.threshold.fetch.debug" : "100ms",
        "index.search.slowlog.threshold.fetch.info" : "200ms",
        "index.search.slowlog.threshold.fetch.warn" : "500ms",
        "index.search.slowlog.threshold.query.debug" : "100ms",
        "index.search.slowlog.threshold.query.info" : "200ms",
        "index.search.slowlog.threshold.query.warn" : "1s"
}

Reference Links:

No alive nodes found in your cluster

This needs specific analysis, check ES logs. It could be a problem caused by the 1000 concurrent connection limit.

Reference Tools

elasticHQ

Reference Links:

ルーティングを使用してデータに対応するシャードの位置を明確にする

Elasticsearchのルーティング（Routing）機能

パフォーマンス低下の原因

クライアントが単にクエリを速すぎるバーストで送信しすぎて、キューを圧倒している。Node Statsで時間をかけて監視し、バースト性があるかスムーズかを確認できます
長時間「スタック」する非常に遅いクエリがあり、スレッドを消費し、キューをバックアップさせている。スローログを有効にして、異常に長い時間がかかっているクエリがあるかどうかを確認し、それらを調整してみてください
Groovyなどで書かれた「終わらない」スクリプトがある可能性があります。例：終了しないループで、スレッドが永遠に回転する
ハードウェアがワークロードに対してプロビジョニング不足で、何らかのリソース（ディスク、CPUなど）でボトルネックになっている可能性があります
iSCSIターゲットからの一時的な問題で、進行中のすべての操作がディスクが戻るのを待ってブロックされる。忙しいクラスターを深刻にバックアップするのに大きなレイテンシの問題は必要ありません… ESは一般的にディスクが常に利用可能であることを期待しています
重いガベージコレクションも問題を引き起こす可能性があります。Node Statsをチェックして、多くの/長いold gen GCが実行されているかどうかを確認してください

この英語のテキストは、基本的に、あなたのマシンが低すぎる、クエリが多すぎる、と言っています。MySQLの経験もESに適用されます。

ノードの選択

自分で遊ぶ場合は、それほど多くのノードを作らないでください。Elasticsearchはメモリキラーです。

リソースが不足している場合、coordinating、data、ingestを組み合わせることができます。

coordinating

調整ノードはリクエストのエントリーポイントです

node.master:false
node.data:false
node.ingest:false

master

マスターを選択し、シャードの位置を決定

node.master:true
node.data:false
node.ingest:false

data

シャードを格納するノード

node.master:false
node.data:true
node.ingest:false

ingest

ingestノードはパイプラインの処理を担当します

node.master:false
node.data:false
node.ingest:true

システム設定の最適化

thread_pool:
    bulk:
        queue_size: 2000
    search:
        queue_size: 2000
indices:
  query:
    bool:
      max_clause_count: 50000
  recovery:
    max_bytes_per_sec:

queue_sizeは同時クエリの制限で、デフォルトは1000です。異なるバージョンでは名前が少し異なる場合があります。スレッドプールのパラメータは、起動パラメータに直接添付できます（結局、設定ファイルをマウントすることも私にとっては面倒です）。

参考：

Index設定の最適化

shard

小さなノードでは、単一シャードのサイズが30GBを超えないことをお勧めします。より高スペックのノードでは、単一シャードのサイズが50GBを超えないことをお勧めします。

ログ分析シナリオまたは超大インデックスの場合、単一シャードのサイズが100GBを超えないことをお勧めします。

シャードの数（レプリカを含む）は、可能な限りノード数と一致させ、ノード数に等しいか、ノード数の整数倍にする必要があります。

通常、単一ノード上の同じインデックスのシャード数が5を超えないことをお勧めします。

クエリの最適化

_allパラメータについては、ビジネス使用で必要ない場合、通常は無効にするか、選択的に追加することをお勧めします。

必要なフィールドのみを選択

リレーショナルデータベースと同様に、select *を使用しないでください。

GET /product/goods/109524071?filter_path=_source.zdid
{
  "_source" : {
    "zdid" : 48
  }
}

同様の用法には_sourceがありますが、filter_pathとは異なり、返される結果にはドキュメント自体のデフォルトフィールドが含まれます。

GET /product/goods/109524071?_source_include=zdid
{
  "_index" : "product",
  "_type" : "goods",
  "_id" : "109524071",
  "_version" : 4,
  "found" : true,
  "_source" : {
    "zdid" : 48
  }
}

_source=false
_source_include=zdid
_source_exclude

注意：_sourceとfilter_pathは一緒に使用できません

新しいインデックスを作成する際にインデックスマッピングの自動マッピング機能を無効にする

インデックスエイリアス

その他の経験

実際の経験によると、elasticsearchはインデックスが少なく、検索が多いため、検索を最適化するのが適切です。

ログのベストプラクティス

ログが失われても問題ない場合は、1ノード0レプリカシャードでログを保存することをお勧めします。

ログインデックスにはxx-<date>を使用し、削除する際は直接インデックスを削除できます。

delete by queryを使うたびに死にたくなります…

POST /tracing/_delete_by_query?conflicts=proceed
{
	"query": {
		"range": {
			"@timestamp": {
				"lt": "now-90d",
				"format": "epoch_millis"
			}
		}
	}
}

GET /_tasks?&actions=*delete*

障害メンテナンス

Unassigned Shards

解決策： number_of_replicasを0に設定した新しいインデックスを作成し、_reindexを使用します。移行が完了したら、number_of_replicasを元に戻します。reindexにはsizeパラメータがあり、必要に応じて設定するとより高速になる可能性があります。

注意 GET _tasks?actions=indices:data/write/reindex?detailedを通じて関連タスクを表示できます

参考

reindex

reindexにもテクニックがあります。

# レプリカを無効化
put geonames/_settings
{
 
    "settings" : {
      "index" : {
        "number_of_replicas" : "0"
    }
  
}
}
# リフレッシュ無効化中、_count結果は更新されません
json='{"index":{"refresh_interval":"-1"}}'
curl -XPUT 0.0.0.0:9200/geonames/_settings -H 'Content-Type: application/json' -d $json

# 途中でキャンセルすることもできます
curl -XPOST 0.0.0.0:9200/_tasks/mHCg6HqYTqqd12nIDFDk1w:2977/_cancel

# リフレッシュメカニズムを復元
json='{"index":{"refresh_interval":null}}'
curl -XPUT 0.0.0.0:9200/geonames/_settings -H 'Content-Type: application/json' -d $json

gc overhead

[2019-01-04T08:41:09,538][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-onekey-3] [gc][159] overhead, spent [276ms] collecting in the last [1s]

解決策/問題の根源： クラスターの負荷が重すぎて、クラッシュしました

インデックスが長時間yellow

解決策/問題の根源： まずnumber_of_replicasを0に設定し、次に元に戻して、手動で同期をトリガーします。

put geonames/_settings
{
 
    "settings" : {
      "index" : {
        "number_of_replicas" : "0"
    }
  
}
}

ローリング再起動

_rolling_restarts

スローログ分析

スローログは検索とインデックスの2種類に分かれ、インデックスまたはクラスターレベルで設定できます

PUT _settings
{
        "index.indexing.slowlog.threshold.index.debug" : "10ms",
        "index.indexing.slowlog.threshold.index.info" : "50ms",
        "index.indexing.slowlog.threshold.index.warn" : "100ms",
        "index.search.slowlog.threshold.fetch.debug" : "100ms",
        "index.search.slowlog.threshold.fetch.info" : "200ms",
        "index.search.slowlog.threshold.fetch.warn" : "500ms",
        "index.search.slowlog.threshold.query.debug" : "100ms",
        "index.search.slowlog.threshold.query.info" : "200ms",
        "index.search.slowlog.threshold.query.warn" : "1s"
}

参考リンク：

No alive nodes found in your cluster

これは具体的に分析する必要があり、ESのログを確認してください。同時接続数1000の制限による問題の可能性があります。

参考ツール

elasticHQ

参考リンク：

Используйте routing для указания местоположения шарда для данных

Функция маршрутизации Elasticsearch (Routing)

Причины снижения производительности

Ваши клиенты просто отправляют слишком много запросов слишком быстро в быстром всплеске, перегружая очередь. Вы можете отслеживать это с помощью Node Stats с течением времени, чтобы увидеть, является ли это всплеском или плавным
У вас есть очень медленные запросы, которые “застревают” на долгое время, потребляя потоки и вызывая резервное копирование очереди. Вы можете включить медленный журнал, чтобы увидеть, есть ли запросы, которые занимают исключительно долгое время, а затем попытаться настроить их
Могут быть потенциально “бесконечные” скрипты, написанные на Groovy или что-то подобное. Например, цикл, который никогда не выходит, заставляя поток вращаться вечно.
Ваше оборудование может быть недостаточно подготовлено для вашей рабочей нагрузки и создавать узкое место на каком-то ресурсе (диск, процессор и т.д.)
Временная проблема с вашей целью iSCSI, которая заставляет все операции в полете блокироваться, ожидая возврата дисков. Не потребуется большой задержки, чтобы серьезно резервировать занятый кластер… ES обычно ожидает, что диски всегда доступны.
Тяжелые сборки мусора также могут вызвать проблемы. Проверьте Node Stats, чтобы увидеть, запущено ли много/долгих old gen GC

Этот английский текст в основном говорит: ваша машина слишком низкая, вы запрашиваете слишком много. Опыт MySQL также применим к ES.

Выбор узлов

Если вы просто играете, не создавайте так много узлов. Elasticsearch — убийца памяти.

Если ресурсы ограничены, coordinating, data и ingest можно объединить.

coordinating

Координирующие узлы — это точка входа для запросов

node.master:false
node.data:false
node.ingest:false

master

Выбор мастера, определение местоположения шардов

node.master:true
node.data:false
node.ingest:false

data

Узлы, которые хранят шарды

node.master:false
node.data:true
node.ingest:false

ingest

Узлы ingest отвечают за обработку конвейеров

node.master:false
node.data:false
node.ingest:true

Оптимизация системной конфигурации

thread_pool:
    bulk:
        queue_size: 2000
    search:
        queue_size: 2000
indices:
  query:
    bool:
      max_clause_count: 50000
  recovery:
    max_bytes_per_sec:

queue_size — это ограничение для одновременных запросов, по умолчанию 1000. В разных версиях названия могут немного отличаться. Параметры пула потоков можно напрямую прикрепить к параметрам запуска (в конце концов, монтирование файлов конфигурации также является проблемой для меня).

Ссылки:

Оптимизация конфигурации индекса

shard

Рекомендуется, чтобы размер одного шарда не превышал 30 ГБ на небольших узлах. На узлах более высоких характеристик размер одного шарда не должен превышать 50 ГБ.

Для сценариев анализа логов или очень больших индексов рекомендуется, чтобы размер одного шарда не превышал 100 ГБ.

Количество шардов (включая реплики) должно максимально соответствовать количеству узлов, равняться количеству узлов или быть целым кратным количеству узлов.

Обычно мы рекомендуем, чтобы количество шардов для одного индекса на одном узле не превышало 5.

Оптимизация запросов

Для параметра _all, если он не нужен в бизнес-использовании, наша обычная рекомендация — отключить его или добавить выборочно.

Выбирать только необходимые поля

Как и в реляционных базах данных, не используйте select *.

GET /product/goods/109524071?filter_path=_source.zdid
{
  "_source" : {
    "zdid" : 48
  }
}

Подобное использование включает _source, но в отличие от filter_path, возвращаемые результаты будут включать поля по умолчанию самого документа.

GET /product/goods/109524071?_source_include=zdid
{
  "_index" : "product",
  "_type" : "goods",
  "_id" : "109524071",
  "_version" : 4,
  "found" : true,
  "_source" : {
    "zdid" : 48
  }
}

_source=false
_source_include=zdid
_source_exclude

Примечание: _source и filter_path нельзя использовать вместе

Отключить автоматическое сопоставление при создании новых индексов

псевдонимы индекса

Другой опыт

Согласно практическому опыту, elasticsearch в основном используется для меньшего количества индексов и большего количества поисков, поэтому оптимизация для поиска более подходящая.

Лучшие практики для логов

Если потеря логов не имеет значения, рекомендуется использовать 1 узел с 0 реплика-шардами для хранения логов.

Используйте xx-<date> для логовых индексов, чтобы при удалении можно было напрямую удалить индекс.

Я чувствую, что умираю каждый раз, когда использую delete by query…

POST /tracing/_delete_by_query?conflicts=proceed
{
	"query": {
		"range": {
			"@timestamp": {
				"lt": "now-90d",
				"format": "epoch_millis"
			}
		}
	}
}

GET /_tasks?&actions=*delete*

Устранение неполадок и обслуживание

Не назначенные шарды

Решение: Создайте новый индекс с number_of_replicas, установленным на 0, затем используйте _reindex. После завершения миграции измените number_of_replicas обратно. reindex имеет параметр size, настройка его по мере необходимости может быть быстрее.

Примечание Вы можете просмотреть связанные задачи через GET _tasks?actions=indices:data/write/reindex?detailed

Ссылки

reindex

reindex также имеет техники.

# Отключить реплики
put geonames/_settings
{
 
    "settings" : {
      "index" : {
        "number_of_replicas" : "0"
    }
  
}
}
# Во время отключения обновления результаты _count не обновляются
json='{"index":{"refresh_interval":"-1"}}'
curl -XPUT 0.0.0.0:9200/geonames/_settings -H 'Content-Type: application/json' -d $json

# Вы также можете отменить на полпути
curl -XPOST 0.0.0.0:9200/_tasks/mHCg6HqYTqqd12nIDFDk1w:2977/_cancel

# Восстановить механизм обновления
json='{"index":{"refresh_interval":null}}'
curl -XPUT 0.0.0.0:9200/geonames/_settings -H 'Content-Type: application/json' -d $json

gc overhead

[2019-01-04T08:41:09,538][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-onekey-3] [gc][159] overhead, spent [276ms] collecting in the last [1s]

Решение/Корень проблемы: Кластер перегружен, упал

Индекс желтый в течение длительного времени

Решение/Корень проблемы: Сначала установите number_of_replicas на 0, затем верните обратно, вручную запустите синхронизацию.

put geonames/_settings
{
 
    "settings" : {
      "index" : {
        "number_of_replicas" : "0"
    }
  
}
}

Катящийся перезапуск

_rolling_restarts

Анализ медленного журнала

Медленные журналы делятся на типы поиска и индекса, и могут быть установлены на уровне индекса или кластера

PUT _settings
{
        "index.indexing.slowlog.threshold.index.debug" : "10ms",
        "index.indexing.slowlog.threshold.index.info" : "50ms",
        "index.indexing.slowlog.threshold.index.warn" : "100ms",
        "index.search.slowlog.threshold.fetch.debug" : "100ms",
        "index.search.slowlog.threshold.fetch.info" : "200ms",
        "index.search.slowlog.threshold.fetch.warn" : "500ms",
        "index.search.slowlog.threshold.query.debug" : "100ms",
        "index.search.slowlog.threshold.query.info" : "200ms",
        "index.search.slowlog.threshold.query.warn" : "1s"
}

Ссылки:

No alive nodes found in your cluster

Это требует конкретного анализа, проверьте логи ES. Это может быть проблема, вызванная ограничением в 1000 одновременных соединений.

Справочные инструменты

elasticHQ

Ссылки:

💬 讨论 / Discussion

对这篇文章有想法？欢迎在 GitHub 上发起讨论。
Have thoughts on this post? Start a discussion on GitHub.

在 GitHub 参与讨论 / Discuss on GitHub