Skip to main content

Generated from aind-library-template

Project description

aind-data-migration-utils

License Code Style semantic-release: angular Interrogate Coverage Python

Installation

pip install aind-data-migration-utils

Usage

To use the Migrator object, you need to create a DocDB query and a callback. The callback should take a full metadata record as input and return the same metadata record, with any modifications you need to make. Note that you will only have access to core metadata files that you specifically request using Migrator(files: List[str]).

There are two main arguments that control the Migrator class and how it runs:

  • Migrator(test_mode: bool) controls whether or not to run the migrator over all records or just a single record. This is useful when you are running a large migration and want to modify just a single file in production.
  • .run(full_run: bool) whether to actually modify records on the DocDB server

Running a dry run stores a hash that tracks what the dry run was completed on. You cannot run a full run until a hash for that dry run is completed.

The full process of running a migration is:

  1. Define your query and callback, make sure to use logging to clearly explain what happened to each record and use the files parameter to limit your request to just the core files you are modifying.
  2. Run you dry run, the hash file should get generated so that you can run your full run.
  3. Open your PR and get confirmation that your code works properly.
  4. Run your full run.
  5. Merge the PR.

If your code modifies large numbers of records, split step 4 into three partial steps: (a) re-run the dry run with the --test flag to modify only a single record, (b) run the full run with the --test flag and check using metadata-portal.allenneuraldynamics.org/view?name=<your-asset-name> that the record was modified properly, (c) re-run the full dry and full runs.

Example

from aind_data_migration_utils.migrate import Migrator
import argparse
import logging

# Create a docdb query
query = {
    "_id": {"_id": "your-id-to-fix"}
}

def your_callback(record: dict) -> dict:
    """ Make changes to a record """

    # For example, convert a subject ID that wasn't a string to a string
    if not isinstance(record["subject"]["subject_id"], str):
        original_type = type(record["subject"]["subject_id"])
        record["subject"]["subject_id"] = str(record["subject"]["subject_id"])
        logging.info(f"Modified type of subject_id field for record {record["name"]} from {original_type} to str)")
    
    # Note: raising Exceptions inside a callback will log errors in the results.csv file

    return record


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--full-run", action=argparse.BooleanOptionalAction, required=False, default=False)
    parser.add_argument("--test", action=argparse.BooleanOptionalAction, required=False, default=False)
    args = parser.parse_args()

    migrator = Migrator(
        query=query,
        migration_callback=your_callback,
        test_mode=args.test,
        files=["subject"],
        prod=True,
    )
    migrator.run(full_run=args.full_run)

Call your code to run the dry run. You can run multiple dry runs as needed.

python run.py

After completing a dry run for your specific query, pass the --full-run argument to push changes to DocDB.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aind_data_migration_utils-0.8.0.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aind_data_migration_utils-0.8.0-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file aind_data_migration_utils-0.8.0.tar.gz.

File metadata

File hashes

Hashes for aind_data_migration_utils-0.8.0.tar.gz
Algorithm Hash digest
SHA256 97b9a8347a1282f3e430fa79a7d78851335f1c1cb42cb7a73b2b4e9cf1d67479
MD5 e7b253f66d0e1693c41923acdd9d15f7
BLAKE2b-256 c3cdaf69476c34dd5228c5057fef869ae6bc770485f6f9a8dd8f162dd1e7eab3

See more details on using hashes here.

File details

Details for the file aind_data_migration_utils-0.8.0-py3-none-any.whl.

File metadata

File hashes

Hashes for aind_data_migration_utils-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1f3b90809549845317174c9728849e7299f23e2a81ac6c5222df7f76e94c6c42
MD5 a3690074923b8a214ffbdfbb170fda2c
BLAKE2b-256 3cd59b5686238f7405af963d01e2c07ed56a9b4aaff61022b078cd103a685105

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page