ElasticSearch数据迁移

在机器上进行归档数据，需要借助的是reindexAPI。

1、第一步

迁移数据前，在ES中创建对应的index以及mapping。

例如：

PUT /my_index
{
    "settings": { 
        "number_of_shards":1,
		"number_of_replicas":0
    },
    "mappings": {
          "time" : {
            "type" : "integer"
          },
          "uid" : {
            "type" : "text"
          }，
        ...
    }
}

为了加快数据迁移，将 refresh_interval 置为-1，number_of_replicas 置为0：

curl -XPUT -H 'Content-type':'application/json' {some ip}:9200/{index}/_settings -d 
'{
    "index": {
        "refresh_interval": -1,
        "number_of_replicas": 0
    }
}'

refresh_interval：文档自动刷新功能，文档的变化并不是立即对搜索实时可见的，需要进行一次刷新；

number_of_replicas ：创建副本，存储备份用，在导入时，关闭以节省时间。

2、第二步

使用reindex-from-remote来同步数据

curl -XPOST -H 'Content-type':'application/json' http://127.0.0.1:9200/_reindex?pretty -d '{
    "conflicts": "proceed",
    "source": {
        "remote":{
            "host":"http://127.0.0.1:9200"
        },
        "index":"produce_record",
        "query": {
         "match": {
            "test": "data"
          }
        }，
        "size":1000
    },
    "dest": {
        "index": "produce_record_v1",
        "version_type": "external"
    }
}'

conflicts：导入过程中，源文档和目标文档出现冲突，不会导致导入失败。

version_type为"external"，当目标文档中_id相同时，将保留文档，而不是去覆盖；

"internal"则是直接覆盖。

3、第三步

将 refresh_interval、number_of_replicas 分别设置回"30s"和1。

curl -XPUT -H 'Content-type':'application/json' {some ip}:9200/{index}/_settings -d 
'{
    "index": {
        "refresh_interval": "30s",
        "number_of_replicas": 1
    }
}'

ElasticSearch集群方案：

使用多节点，冷热数据分离的方式，也就是将读写I/O进行区分，在master上进行写操作，大量read操作分散到集群节点上。

Elasticsearch集群冷热分离-实际操作

elasticsearch5.x系列之八冷热数据分离方案

ElasticSearch数据迁移

ElasticSearch数据迁移

1、第一步

2、第二步

3、第三步

推荐文章（由hexo文章推荐插件驱动）