Skip to main content

A tool for cleaning up external DataJoint stores

Project description

DataJoint-Cleaner

Test Black Mypy codecov Build PyPI version

DataJoint-Cleaner is a tool used to clean external DataJoint stores.

How It Works

The user provides information pointing to a particular external table and external store combination. Using this information DataJoint-Cleaner will delete all objects in the external store that have no corresponding entry in the external table, thus freeing storage space.

Important Considerations

The creation of a backup is highly recommended before using DataJoint-Cleaner to avoid a potential loss of data due to an user error or a bug.

DataJoint-Cleaner should not be used to clean external stores that are used by multiple database servers or multiple storage protocols (e.g. file & s3). If done so it could potentially delete objects that are still referenced in the database.

Installation

Recommended installation method

To avoid messing up the system Python environment, the most recommended way to install DataJoint-Cleaner is via pipx:

pipx install datajoint-cleaner

Other installation methods

Install DataJoint-Cleaner into user site with pip:

pip install --user datajoint-cleaner

Configuration

DataJoint-Cleaner will look for a TOML file called datajoint-cleaner.toml in the current working directory (by default) to configure itself. The configuration file must have two top-level tables called database_servers and storage_servers and an array of tables called cleaning_runs.

Specifying Database Servers

Database servers are specified in the top-level database_servers table. Each key in the table corresponds to a distinct database server. The value of each key must be a table that contains the following keys: host, user and password.

The values of the host, user and password keys correspond to the host name of the database server, the name of a user present on said server and the password of said user, respectively.

Example:

[database_servers.my_db_server]
host = "192.156.3.65"
user = "me"
password = "mypassword"

Specifying Storage Servers

Storage servers are specified in a sub-table of the storage_servers table based on their kind. Currently only MinIO servers are supported which are specified in the minio sub table. The keys necessary to specify a MinIO server are endpoint, access_key, secret_key and secure.

The values of these keys correspond to the endpoint of the MinIO server, your access and private key and whether a secure connection should be established or not, respectively.

Example:

[storage_servers.minio.my_minio_server]
endpoint = "192.543.5.61"
access_key = "my_access_key"
secret_key = "my_secret_key"
secure = true

Specifying Cleaning Runs

Individual cleaning runs are specified in the top-level array of tables called cleaning_runs. Each table in the array corresponds to a distinct cleaning run and must have the following keys:

  • database_server: Name of a database server specified in the database_servers table
  • schema: Name of a schema on said database server
  • store: Name of a DataJoint store for which an external table exists in said schema
  • storage_server: A storage server specified in the storage_servers table in the <kind>.<name> format
  • bucket: Name of a bucket on said MinIO server
  • location: Location of externally stored objects in said bucket

Example:

[[cleaning_runs]]
database_server = "my_db_server"
schema = "my_schema"
store = "my_store"
storage_server = "minio.my_minio_server"
bucket = "my_bucket"
location = "my_location"

Usage

The cleaning process can be started like so:

dj-cleaner

The command above will execute all cleaning runs defined in the configuration file. The --config-file option can be used to pass a custom path to a configuration file to DataJoint-Cleaner:

dj-cleaner --config-file /path/to/config/file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datajoint-cleaner-0.1.tar.gz (21.5 kB view hashes)

Uploaded source

Built Distribution

datajoint_cleaner-0.1-py3-none-any.whl (26.0 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page