Skip to main content

A tool for cleaning up external DataJoint stores

Project description

DataJoint-Cleaner

Test Black Mypy codecov Build PyPI version

DataJoint-Cleaner is a tool used to clean external DataJoint stores.

How It Works

The user provides information pointing to a particular external table and external store combination. Using this information DataJoint-Cleaner will delete all objects in the external store that have no corresponding entry in the external table, thus freeing storage space.

Important Considerations

The creation of a backup is highly recommended before using DataJoint-Cleaner to avoid a potential loss of data due to an user error or a bug.

DataJoint-Cleaner should not be used to clean external stores that are used by multiple database servers or multiple storage protocols (e.g. file & s3). If done so it could potentially delete objects that are still referenced in the database.

Installation

Recommended installation method

To avoid messing up the system Python environment, the most recommended way to install DataJoint-Cleaner is via pipx:

pipx install datajoint-cleaner

Other installation methods

Install DataJoint-Cleaner into user site with pip:

pip install --user datajoint-cleaner

Configuration

DataJoint-Cleaner will look for a TOML file called datajoint-cleaner.toml in the current working directory (by default) to configure itself. The configuration file must have two top-level tables called database_servers and storage_servers and an array of tables called cleaning_runs.

Specifying Database Servers

Database servers are specified in the top-level database_servers table. Each key in the table corresponds to a distinct database server. The value of each key must be a table that contains the following keys: host, user and password.

The values of the host, user and password keys correspond to the host name of the database server, the name of a user present on said server and the password of said user, respectively.

Example:

[database_servers.my_db_server]
host = "192.156.3.65"
user = "me"
password = "mypassword"

Specifying Storage Servers

Storage servers are specified in a sub-table of the storage_servers table based on their kind. Currently only MinIO servers are supported which are specified in the minio sub table. The keys necessary to specify a MinIO server are endpoint, access_key, secret_key and secure.

The values of these keys correspond to the endpoint of the MinIO server, your access and private key and whether a secure connection should be established or not, respectively.

Example:

[storage_servers.minio.my_minio_server]
endpoint = "192.543.5.61"
access_key = "my_access_key"
secret_key = "my_secret_key"
secure = true

Specifying Cleaning Runs

Individual cleaning runs are specified in the top-level array of tables called cleaning_runs. Each table in the array corresponds to a distinct cleaning run and must have the following keys:

  • database_server: Name of a database server specified in the database_servers table
  • schema: Name of a schema on said database server
  • store: Name of a DataJoint store for which an external table exists in said schema
  • storage_server: A storage server specified in the storage_servers table in the <kind>.<name> format
  • bucket: Name of a bucket on said MinIO server
  • location: Location of externally stored objects in said bucket

Example:

[[cleaning_runs]]
database_server = "my_db_server"
schema = "my_schema"
store = "my_store"
storage_server = "minio.my_minio_server"
bucket = "my_bucket"
location = "my_location"

Usage

The cleaning process can be started like so:

dj-cleaner

The command above will execute all cleaning runs defined in the configuration file. The --config-file option can be used to pass a custom path to a configuration file to DataJoint-Cleaner:

dj-cleaner --config-file /path/to/config/file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datajoint-cleaner-0.1.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

datajoint_cleaner-0.1-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file datajoint-cleaner-0.1.tar.gz.

File metadata

  • Download URL: datajoint-cleaner-0.1.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.2

File hashes

Hashes for datajoint-cleaner-0.1.tar.gz
Algorithm Hash digest
SHA256 c30d8611cb081a5117f77deb710fbbd9e0469508f5002a12c9ac01ec26cbbc71
MD5 ec7d8e7aa0b059b5fa20bba81d87bbb4
BLAKE2b-256 4cfa6a2a1f4dbeffadee8b717375474fa2eb03a9b834b24bdb74647a823ccc13

See more details on using hashes here.

File details

Details for the file datajoint_cleaner-0.1-py3-none-any.whl.

File metadata

  • Download URL: datajoint_cleaner-0.1-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.2

File hashes

Hashes for datajoint_cleaner-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2b1839181331eabd487866df300412042e40685114cb8e72bcb83e38de759fab
MD5 2ba3e8eda317ae91aff24fd7dbadf529
BLAKE2b-256 4e7612f25a4c98933885507afc217979945fe502f76d7d5a4c3f35b2123cf5a5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page