Elasticsearch - Dumping documents from multi-node to single node


Elasticsearch - Dumping documents from multi-node to single node

Elasticsearch - Dumping documents from multi-node to single node

Elasticsearch three node cluster:

Elasticsearch is running as three node cluster, task is to copy and restore the multi-node to single node cluster.

node 1 : "http://node1:9300, http://node1:9200"
node 2 : "http://node2:9300, http://node2:9200"
node 3 : "http://node2:9300, http://node3:9200"

As the shards getting distributed between nodes so no single node will have the complete data. When we manually copy and restore to single node instance there will be an unassigned shards of each node. Follow these steps to restore from multi node to single node.

  • Create the single node cluster using the docker-compose.yml file
cluster.name: jinnabalu_cluster
#node.name: "node-one"
#index.number_of_shards: 1
#index.number_of_replicas: 0
network.bind_host: 0.0.0.0
#network.host: 0.0.0.0
#discovery.zen.ping.multicast.enabled: false
cluster.routing.allocation.disk.threshold_enabled: true 
cluster.routing.allocation.disk.watermark.flood_stage: 200mb
cluster.routing.allocation.disk.watermark.low: 500mb 
cluster.routing.allocation.disk.watermark.high: 300mb
version: '2'
services:
    jinnabalu_cluster-elasticsearch:
        container_name: jinnabalu_cluster-elasticsearch
        image: elasticsearch:2.4.1
        environment:
            - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
        volumes:
            - /var/db/elasticsearch/data:/usr/share/elasticsearch/data
            - ./elasticsearch-conf.yml:/usr/share/elasticsearch/config/elasticsearch.yml
        ports:
            - 9200:9200
            - 9300:9300

  • Copy the folder from node 1, with the scp -r /var/db/node1/elasticsearch/** /var/db/cassandra
  • Start the cluster using docker-compose up -d
  • By default, Elasticsearch will re-assign shards to nodes dynamically. with unassigned shards.
  • Check with the shards with curl -X GET "localhost:9200/_cat/shards"
  • Number of nodes in the cluster was three so there was no extra node to create the replica, and restore the unassigned indexes, So the health was turning to red. Created the index with settings property and set the number_of_replicas as 0.
curl -X PUT "localhost:9200/_settings?pretty" -H 'Content-Type: application/json' -d'
{
  "index" : {
    "number_of_replicas" : 0
  }
}
'
  • Check with shards again and note down the number of unassigned node shards
  • Manually copy the shards which are unassigned from node 2 or node 3
  • Example, copy index client shards 2, 4
scp -r <path_multinode_data_folder>/<cluster_name>/nodes/0/indices/client/2 <path_multinode_data_folder>/<cluster_name>/nodes/0/indices/client/

scp -r <path_multinode_data_folder>/<cluster_name>/nodes/0/indices/client/4 <path_multinode_data_folder>/<cluster_name>/nodes/0/indices/client/
  • Restart the elasticsearch docker-compose down and docker-compose up -d
  • Check with the shards and see if any unassigned shards exist and repeat the same as above.
  • When we are done with restoring the shards, node health will be turned into the green

Missing shards can be copied manually to the folder. However, if you’ve disabled shard allocation (perhaps you did a rolling restart and forgot to re-enable it), you can re-enable shard allocation.

# v0.90.x and earlier
curl -X PUT "localhost:9200/_settings?pretty" -H 'Content-Type: application/json' -d'
    "index.routing.allocation.disable_allocation": false
}'

# v1.0+
curl -X PUT "localhost:9200/_settings?pretty" -H 'Content-Type: application/json' -d'
    "transient" : {
        "cluster.routing.allocation.enable" : "all"
    }
}'

Elasticsearch will then reassign shards as normal. This can be slow, consider raising indices.recovery.max_bytes_per_sec and cluster.routing.allocation.node_concurrent_recoveries to speed it up.

If you’re still seeing issues, something else is probably wrong, so look in your Elasticsearch logs for errors. If you see EsRejectedExecutionException your thread pools may be too small.

Finally, you can explicitly reassign a shard to a node with the reroute API.

# Suppose shard 4 of index "my-index" is unassigned, so you want to
# assign it to node search03:
curl -XPOST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
    "commands": [{
        "allocate": {
            "index": "my-index",
            "shard": 4,
            "node": "search03",
            "allow_primary": 1
        }
    }]
}'

Related Post:

Why Python for Production Services

Vector Deployment Patterns

Vector Aggregator — Transform and Route

Vector Agent — Lightweight Log Collection

HashiCorp Vault — Centralized Secret Management

Vault Auth Methods — Token vs AppRole

Unit Tests for Business Logic

Test Coverage and CI Integration

Docker Swarm Deployment

Why Structured Logging Matters

structlog — JSON Logging with Context

Secret Workflow — Local to Production

Scheduled Security Scans

Production Operations Runbook

pytest — Fixtures, Conftest, and Async Testing

Pydantic — Request & Response Validation

Prometheus Metrics — RED Method

Project Structure with pyproject.toml

Pre-Commit Hooks for Security

Auto-Instrumentation for FastAPI

OpenTelemetry — Traces, Spans, and Context

Marathon/Mesos Deployment

Log Rotation and Disk Management

End-to-End Pipeline

Kubernetes Deployment

Kafka — Durable Log Transport

Jaeger — Visualizing Distributed Traces

Integration Tests for API Endpoints

Health Checks and Readiness Probes

Graceful Shutdown

GitHub Actions CI Pipeline

FastAPI — Async-First HTTP Framework

Error Handling & Response Models

Elasticsearch + Kibana — Search and Visualize

Dual Output — Stdout and File Logging

Docker — Containerize from Day One

Docker Compose Deployment

Dependency Updates and Maintenance

Dependency Auditing with pip-audit

Request-Scoped Logging with Correlation IDs

Container Security with Trivy

Environment-Based Config with pydantic-settings

Async Database Operations

API Versioning Strategies

RESTful Route Design with FastAPI Router

K8s Contributor Playground, Learning by Contributing

Git - Switch Remote URL

Git - Reset Commits

Git - Cheat Sheet

Git - Push with SSH

Git - Merge

Git - Init

Git - Project Lead/Manager

Git - Commit Files

Git - Create Branch

Git - Common Commands

Git - Branch Management

Adding Try in PWD button to README file

Docker Issues

Git - Basics for a Developer

Jenkins - Upgrade Jenkins

SED

Jenkins - Schedule

Open JDK docker container commands shell access to the container

AWS EBS Volmes - Create and attach the EBS volume with mounting

CICD Jenkins - Send email with default content

Git - Intro

Linux - sed command

Arachni - VAPT Tool

VAPT - Vulnerability Assessment and Penetration Testing