Elasticsearch Migration 2.x to 6.x using S3

By in , , , , ,
No comments

When you are stuck with a migration problem in elasticsearch, say version currently run is 2.x and have to migrate to say 6.x. You have to do this:

  1. Migrate 2.x to 5.x
  2. Migrate 5.x to 6.x

This is a cumbersome process, also time consuming. You need a temporary elasticsearch cluster, running in version 5. Not to mention the process is slow.

I decided to do this with elasticdump with the following steps for backup and restore. There are two prerequisites for this process elasticdump and awscli. Check if node version is above 8 (node must be installed obviously). We can also transform records while taking backup. We can write a simple js function, save it to a file and give the path to the file in the --transform command line option during elasticdump. elasticdump rocks!

npm i elasticdump -g
pip install awscli

Backup

  1. Run elasticdump to get json data
  2. compress the json file
  3. put it to s3
#!/bin/bash

if [ $# -ne 1 ]; then
    echo "Usage: ./backup.sh <file-name>"
    exit 1
fi

# Modify the config values here
BUCKET=
ES_DOMAIN=
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_DEFAULT_REGION=

backup_index(){
    ES_INDEX=$1

    echo "Dumping ${ES_INDEX} data..."
    # can add --transform if needed below
    elasticdump --quiet --limit 1000 --input=${ES_DOMAIN}/${ES_INDEX} --output=${ES_INDEX}.json --type=data
    sleep 5s

    echo "Compressing ${ES_INDEX} data..."
    gzip ${ES_INDEX}.json

    echo "Uploading to ${ES_INDEX} backup to S3 bucket: ${BUCKET}"
    aws s3 cp ${ES_INDEX}.json.gz s3://${BUCKET}/

    echo "Cleaning up"
    rm ${ES_INDEX}.json.gz
}


while read -r name
do
    backup_index $name
done < "$1"

To run this, create a file with line containing one index name. Save the script as backup.sh and the file as indices.txt

The script can be run as

./backup.sh indices.txt

Restore

  1. Download compressed json file from s3.
  2. uncompress it.
  3. create index with the required mapping.
  4. Run elasticdump to put data back
#!/bin/bash

if [ $# -ne 1 ]; then
    echo "Usage: ./restore.sh <index-name>"
    exit 1
fi

# Modify the config values here
BUCKET=
ES_DOMAIN=
ES_INDEX=$1
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_DEFAULT_REGION=

echo "fetching data from s3 for index ${ES_INDEX}..."
aws s3 cp s3://${BUCKET}/${ES_INDEX}.json.gz ${ES_INDEX}.json.gz

echo "uncompressing data for index ${ES_INDEX}..."
gunzip ${ES_INDEX}.json.gz

echo "create index ${ES_INDEX} and putting in mapping"
curl -XPUT --header "content-type: application/JSON" http://localhost:9200/${ES_INDEX} --data @mapping.json
sleep 1s
echo

echo "restoring data into index ${ES_INDEX}..."
elasticdump --quiet --limit 1000 --input=${ES_INDEX}.json --output=${ES_DOMAIN}/${ES_INDEX} --type=data

echo "Cleaning up"
rm ${ES_INDEX}.json

echo "Cleaning up"
rm ${ES_INDEX}.json

Save the script as restore.sh. Lets say we need to restore index abcd , we can run the script as

./restore.sh abcd

Leave a Reply

Your email address will not be published. Required fields are marked *