Skip to main content

Backup, load and restore CouchDB clusters

Project description

Create an archive of a running CouchDB node, saving CouchDB files data/.shards, data/_dbs.couch and data/shards in this order. To allow backup of a running CouchDB, files are copied before archive creation.

Restore an archive of a CouchDB node to a new CouchDB. The new CouchDB can be a cluster of multiple nodes. The new CouchDB configuration should already be done before using Couchcopy, however, all existing data will be deleted. During restoration, CouchDB will be stopped and restarted on each cluster nodes.

Limitations

Tested at least with CouchDB 3.1.1 and 3.3.3.

To restore an archive, Couchcopy needs to stop and start CouchDB. It assumes that CouchDB is controlled by systemd. If you don’t use systemd you can change parameters --couchdb-start and --couchdb-stop.

Your CouchDB n value should be higher or equal to the number of nodes in your CouchDB cluster. Otherwise saving shards from one node would not be enough to save and restore all databases. See CouchDB documentation for more details on replicas and nodes.

The number of shards per database, i.e. the value of q, should be the same for the origin CouchDB and the destination CouchDB. Otherwise, tree /data/shards is not the same.

Couchcopy assumes you have read and write permissions on CouchDB data directories. If you don’t have them, you can try to use the --use-sudo option.

Get started

Install Couchcopy:

pip install --user couchcopy

Make a backup to backup.tar.gz, from machine old-server with CouchDB data at /var/lib/couchdb:

couchcopy backup old-server,/var/lib/couchdb backup.tar.gz

Restore a backup backup.tar.gz to a 3-node CouchDB cluster where machines are accessible via SSH at cluster_vm1, cluster_vm2, cluster_vm3:

couchcopy restore backup.tar.gz admin:password@cluster_vm1,/var/lib/couchdb \
    admin:password@cluster_vm2,/var/lib/couchdb \
    admin:password@cluster_vm3,/var/lib/couchdb

Quickly access data from a backup, by spawning a CouchDB instance:

couchcopy load backup.tar.gz

Improve couchcopy load loading time by preconfiguring CouchDB metadata, so that the Updating CouchDB metadata... step is not needed:

couchcopy unbrand slow_backup.tar.gz quick_backup.tar.gz

For more options:

couchcopy -h
couchcopy backup -h
couchcopy unbrand -h
couchcopy load -h
couchcopy restore -h

On Fedora, CouchDB can be installed and configured with the following :

sudo dnf copr enable -y adrienverge/couchdb
sudo dnf install couchdb
sudo sh -c 'echo "admin = password" >> /etc/couchdb/local.ini'
sudo systemctl restart couchdb

If you work with remote machines, CouchDB needs to listen to remote IPs on each machine. You can enable it with the following (for security, revert it afterwards):

sudo sed -i 's/;bind_address = 127.0.0.1/bind_address = 0.0.0.0/g' /etc/couchdb/local.ini

Implementation details

During restoration, if the new CouchDB nodes names are not the same as the old CouchDB, nodes names are updated using CouchDB /_node/_local/_dbs endpoint. See CouchDB /_node/_local/_dbs endpoint documentation.

During restoration, Couchcopy first updates one CouchDB node metadata (i.e. the list of nodes names) then it lets CouchDB itself synchronize metadata to the other nodes. Couchcopy exits when the synchronization is finished for all nodes, using undocumented CouchDB /_dbs endpoint to monitor CouchDB nodes synchronization. You can skip that part if you want, i.e. you can exit Couchcopy safely when the following log trace is displayed [Waiting for CouchDB cluster synchronization...]. For a CouchDB of 10^5 databases, updating the first node metadata takes 35 minutes then metadata synchronization to the other nodes takes 6 minutes. For a CouchDB of 100 databases only, both operations are nearly instantaneous.

Developer notes

To speed up CouchDB nodes synchronization it is possible to:

Build and publish

python setup.py sdist
twine upload dist/*

License

This program is licensed under the GNU General Public License version 3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

couchcopy-0.2.4.tar.gz (24.6 kB view details)

Uploaded Source

File details

Details for the file couchcopy-0.2.4.tar.gz.

File metadata

  • Download URL: couchcopy-0.2.4.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.5

File hashes

Hashes for couchcopy-0.2.4.tar.gz
Algorithm Hash digest
SHA256 cd6d7ddd47b79133f4cca9bdc610afcbd489785f77a2500d3e72bde0b8779d58
MD5 5042e08f7df7fdcb08313157fe31bf6a
BLAKE2b-256 e01c4b79c8c9f0d7281af06afe303349471563ca1aad8f65854ce7b33cdbb892

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page