Elasticsearch Migration 2.x to 6.x using S3
When you are stuck with a migration problem in elasticsearch, say version currently run is 2.x and have to migrate to say 6.x. You have to do this:
- Migrate 2.x to 5.x
- Migrate 5.x to 6.x
This is a cumbersome process, also time consuming. You need a temporary elasticsearch cluster, running in version 5. Not to mention the process is slow.
I decided to do this with elasticdump with the following steps for backup and restore. There are two prerequisites for this process elasticdump and awscli. Check if node version is above 8 (node must be installed obviously). We can also transform records while taking backup. We can write a simple js function, save it to a file and give the path to the file in the --transform
command line option during elasticdump. elasticdump rocks!
npm i elasticdump -g
pip install awscli
Backup
- Run elasticdump to get json data
- compress the json file
- put it to s3
#!/bin/bash
if [ $# -ne 1 ]; then
echo "Usage: ./backup.sh <file-name>"
exit 1
fi
# Modify the config values here
BUCKET=
ES_DOMAIN=
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_DEFAULT_REGION=
backup_index(){
ES_INDEX=$1
echo "Dumping ${ES_INDEX} data..."
# can add --transform if needed below
elasticdump --quiet --limit 1000 --input=${ES_DOMAIN}/${ES_INDEX} --output=${ES_INDEX}.json --type=data
sleep 5s
echo "Compressing ${ES_INDEX} data..."
gzip ${ES_INDEX}.json
echo "Uploading to ${ES_INDEX} backup to S3 bucket: ${BUCKET}"
aws s3 cp ${ES_INDEX}.json.gz s3://${BUCKET}/
echo "Cleaning up"
rm ${ES_INDEX}.json.gz
}
while read -r name
do
backup_index $name
done < "$1"
To run this, create a file with line containing one index name. Save the script as backup.sh
and the file as indices.txt
The script can be run as
./backup.sh indices.txt
Restore
- Download compressed json file from s3.
- uncompress it.
- create index with the required mapping.
- Run elasticdump to put data back
#!/bin/bash
if [ $# -ne 1 ]; then
echo "Usage: ./restore.sh <index-name>"
exit 1
fi
# Modify the config values here
BUCKET=
ES_DOMAIN=
ES_INDEX=$1
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_DEFAULT_REGION=
echo "fetching data from s3 for index ${ES_INDEX}..."
aws s3 cp s3://${BUCKET}/${ES_INDEX}.json.gz ${ES_INDEX}.json.gz
echo "uncompressing data for index ${ES_INDEX}..."
gunzip ${ES_INDEX}.json.gz
echo "create index ${ES_INDEX} and putting in mapping"
curl -XPUT --header "content-type: application/JSON" http://localhost:9200/${ES_INDEX} --data @mapping.json
sleep 1s
echo
echo "restoring data into index ${ES_INDEX}..."
elasticdump --quiet --limit 1000 --input=${ES_INDEX}.json --output=${ES_DOMAIN}/${ES_INDEX} --type=data
echo "Cleaning up"
rm ${ES_INDEX}.json
echo "Cleaning up"
rm ${ES_INDEX}.json
Save the script as restore.sh
. Lets say we need to restore index abcd
, we can run the script as
./restore.sh abcd