Skip to main content

CLI tool to support migration from DSW 2.14 to DSW 3.0

Project description

dsw2to3

GitHub release (latest SemVer) PyPI LICENSE Documentation Status

CLI tool to support data migration from DSW 2.14 to DSW 3.0

Usage

Prerequisites

  • DSW 3.0
  • MongoDB (with DSW 2.14 data)
  • PostgreSQL (with initial DSW 3.0 structure)
  • S3 storage (e.g. Minio)
  • Python 3.6+ (recommended to use virtual environment)
  • postgresql-devel (libpq-dev in Debian/Ubuntu, libpq-devel on others)

The machine where you are going to execute the migration tool must have access to MongoDB, PostgreSQL, and S3 storage. See examples/docker-compose.yml for reference.

You need to run DSW 3.0 at least once before the data migration, so it initializes your PostgreSQL database (it will create tables and initial data). You can try to log-in with the default user to check if it is initialized correctly.

Don't hesitate to consult with us if unclear.

Installation

You can install the tool using PyPI:

$ python -m venv env
$ . env/bin/activate
(env) $ pip install wheel
(env) $ pip install dsw2to3
...
(env) $ dsw2to3 --help

Or using this repository:

$ git clone https://github.com/ds-wizard/dsw2to3.git
$ python -m venv env
$ . env/bin/activate
(env) $ pip install wheel
(env) $ pip install .
...
(env) $ dsw2to3 --help

Important notes

  • Migration tool must have access to MongoDB database (data source), PostgreSQL database and S3 storage (target). It needs to be configured in config.yml. During the migration (e.g. from DSW or other tool), the data must not change to avoid inconsistency.
  • Migration tool does not make any changes in MongoDB, it only reads data from there.
  • Migration tool checks if target PostgreSQL database is in expected state (after fresh installation of DSW 3.0).
  • Migration tool initially deletes all data from PostgreSQL database before migrating to avoid duplication and inconsistency (for regular use it just removes the default data, e.g., default users).
  • Migration tool initially deletes all objects in configured S3 bucket. If the bucket does not exist, it tries to create a new one.
  • Migration tool migrates data from MongoDB to PostgreSQL in expected way for DSW as well as from MongoDB (GridFS) to S3 storage.
  • You can run the tool with --dry-run to check what it will do. During dry run, nothing is deleted, changed, or added (no SQL transactions are committed).
  • It may happen that your MongoDB database contains inconsistent data (violating integrity). With --fix-integrity you can fix that by skipping data. You should first check what the data are, and then decide if you will fix it manually in MongoDB or migrate without them.
  • This tool may improve based on feedback, check new version and update using pip install -U dsw2to3 if needed.

Steps

  1. Prepare config.yml for the migration based on your setup (see config.example.yml)
  2. Stop DSW in order to prevent changes in data during the migration
  3. Archive data from MongoDB (e.g. using mongodump)
  4. Run dsw2to3 -c path/to/config.yml --dry-run to see how it will work with your configuration
  5. Run dsw2to3 -c path/to/config.yml (see dsw2to3 --help for more options)
  6. After migration, run DSW 3.0 and check the migrated data
  7. Clean up your deployment (get rid of unused services and configuration files)

In case of error during the migration, follow the details from logs. You can run it with --best-effort flag that will skip errors (just log them out).

Questions and Discussion

If anything is unclear, or you need help, let us know via issue in this repository.

License

This project is licensed under the Apache License v2.0 - see the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsw2to3-1.0.2.tar.gz (23.4 kB view details)

Uploaded Source

Built Distribution

dsw2to3-1.0.2-py3-none-any.whl (26.9 kB view details)

Uploaded Python 3

File details

Details for the file dsw2to3-1.0.2.tar.gz.

File metadata

  • Download URL: dsw2to3-1.0.2.tar.gz
  • Upload date:
  • Size: 23.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.1

File hashes

Hashes for dsw2to3-1.0.2.tar.gz
Algorithm Hash digest
SHA256 b62d2b6ab6e0f9c0afce36c6309d2f1f863ff976ca7d246d0c2e2889f15b0ab2
MD5 121f2e7cf733257fe858a5bc1b8e51ee
BLAKE2b-256 4ee81af4c124f7fe3b3ea1fd5bba81159067843257a2f87630901fa9bf6011ea

See more details on using hashes here.

File details

Details for the file dsw2to3-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: dsw2to3-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 26.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.1

File hashes

Hashes for dsw2to3-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 31e8ab15613727294f1afdb5a88b38c202fd871b9d3952d823edfbeb5fa6e134
MD5 f0c50408d003de5a9eef63193986a0dc
BLAKE2b-256 9b829eb585e4354128cd91625603e66ff4f6f42af198a65ac1183d19dac22345

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page