A tool for cleaning up external DataJoint stores
Project description
DataJoint-Cleaner
DataJoint-Cleaner is a tool used to clean external DataJoint stores.
How It Works
The user provides information pointing to a particular external table and external store combination. Using this information DataJoint-Cleaner will delete all objects in the external store that have no corresponding entry in the external table, thus freeing storage space.
Important Considerations
The creation of a backup is highly recommended before using DataJoint-Cleaner to avoid a potential loss of data due to an user error or a bug.
DataJoint-Cleaner should not be used to clean external stores that are used by multiple database servers or multiple storage protocols (e.g. file & s3). If done so it could potentially delete objects that are still referenced in the database.
Installation
Recommended installation method
To avoid messing up the system Python environment, the most recommended way to install DataJoint-Cleaner is via pipx:
pipx install datajoint-cleaner
Other installation methods
Install DataJoint-Cleaner into user site with pip:
pip install --user datajoint-cleaner
Configuration
DataJoint-Cleaner will look for a TOML file called datajoint-cleaner.toml in the current working directory (by default) to configure itself. The configuration file must have two top-level tables called database_servers and storage_servers and an array of tables called cleaning_runs.
Specifying Database Servers
Database servers are specified in the top-level database_servers table. Each key in the table corresponds to a distinct database server. The value of each key must be a table that contains the following keys: host, user and password.
The values of the host, user and password keys correspond to the host name of the database server, the name of a user present on said server and the password of said user, respectively.
Example:
[database_servers.my_db_server]
host = "192.156.3.65"
user = "me"
password = "mypassword"
Specifying Storage Servers
Storage servers are specified in a sub-table of the storage_servers table based on their kind. Currently only MinIO servers are supported which are specified in the minio sub table. The keys necessary to specify a MinIO server are endpoint, access_key, secret_key and secure.
The values of these keys correspond to the endpoint of the MinIO server, your access and private key and whether a secure connection should be established or not, respectively.
Example:
[storage_servers.minio.my_minio_server]
endpoint = "192.543.5.61"
access_key = "my_access_key"
secret_key = "my_secret_key"
secure = true
Specifying Cleaning Runs
Individual cleaning runs are specified in the top-level array of tables called cleaning_runs. Each table in the array corresponds to a distinct cleaning run and must have the following keys:
database_server: Name of a database server specified in thedatabase_serverstableschema: Name of a schema on said database serverstore: Name of a DataJoint store for which an external table exists in said schemastorage_server: A storage server specified in thestorage_serverstable in the<kind>.<name>formatbucket: Name of a bucket on said MinIO serverlocation: Location of externally stored objects in said bucket
Example:
[[cleaning_runs]]
database_server = "my_db_server"
schema = "my_schema"
store = "my_store"
storage_server = "minio.my_minio_server"
bucket = "my_bucket"
location = "my_location"
Usage
The cleaning process can be started like so:
dj-cleaner
The command above will execute all cleaning runs defined in the configuration file. The --config-file option can be used to pass a custom path to a configuration file to DataJoint-Cleaner:
dj-cleaner --config-file /path/to/config/file
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datajoint-cleaner-0.1.tar.gz.
File metadata
- Download URL: datajoint-cleaner-0.1.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c30d8611cb081a5117f77deb710fbbd9e0469508f5002a12c9ac01ec26cbbc71
|
|
| MD5 |
ec7d8e7aa0b059b5fa20bba81d87bbb4
|
|
| BLAKE2b-256 |
4cfa6a2a1f4dbeffadee8b717375474fa2eb03a9b834b24bdb74647a823ccc13
|
File details
Details for the file datajoint_cleaner-0.1-py3-none-any.whl.
File metadata
- Download URL: datajoint_cleaner-0.1-py3-none-any.whl
- Upload date:
- Size: 26.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b1839181331eabd487866df300412042e40685114cb8e72bcb83e38de759fab
|
|
| MD5 |
2ba3e8eda317ae91aff24fd7dbadf529
|
|
| BLAKE2b-256 |
4e7612f25a4c98933885507afc217979945fe502f76d7d5a4c3f35b2123cf5a5
|