A tool for cleaning up external DataJoint stores
Project description
DataJoint-Cleaner
DataJoint-Cleaner is a tool used to clean external DataJoint stores.
How It Works
The user provides information pointing to a particular external table and external store combination. Using this information DataJoint-Cleaner will delete all objects in the external store that have no corresponding entry in the external table, thus freeing storage space.
Important Considerations
The creation of a backup is highly recommended before using DataJoint-Cleaner to avoid a potential loss of data due to an user error or a bug.
DataJoint-Cleaner should not be used to clean external stores that are used by multiple database servers or multiple storage protocols (e.g. file & s3). If done so it could potentially delete objects that are still referenced in the database.
Installation
Recommended installation method
To avoid messing up the system Python environment, the most recommended way to install DataJoint-Cleaner is via pipx:
pipx install datajoint-cleaner
Other installation methods
Install DataJoint-Cleaner into user site with pip
:
pip install --user datajoint-cleaner
Configuration
DataJoint-Cleaner will look for a TOML file called datajoint-cleaner.toml
in the current working directory (by default) to configure itself. The configuration file must have two top-level tables called database_servers
and storage_servers
and an array of tables called cleaning_runs
.
Specifying Database Servers
Database servers are specified in the top-level database_servers
table. Each key in the table corresponds to a distinct database server. The value of each key must be a table that contains the following keys: host
, user
and password
.
The values of the host
, user
and password
keys correspond to the host name of the database server, the name of a user present on said server and the password of said user, respectively.
Example:
[database_servers.my_db_server]
host = "192.156.3.65"
user = "me"
password = "mypassword"
Specifying Storage Servers
Storage servers are specified in a sub-table of the storage_servers
table based on their kind. Currently only MinIO servers are supported which are specified in the minio
sub table. The keys necessary to specify a MinIO server are endpoint
, access_key
, secret_key
and secure
.
The values of these keys correspond to the endpoint of the MinIO server, your access and private key and whether a secure connection should be established or not, respectively.
Example:
[storage_servers.minio.my_minio_server]
endpoint = "192.543.5.61"
access_key = "my_access_key"
secret_key = "my_secret_key"
secure = true
Specifying Cleaning Runs
Individual cleaning runs are specified in the top-level array of tables called cleaning_runs
. Each table in the array corresponds to a distinct cleaning run and must have the following keys:
database_server
: Name of a database server specified in thedatabase_servers
tableschema
: Name of a schema on said database serverstore
: Name of a DataJoint store for which an external table exists in said schemastorage_server
: A storage server specified in thestorage_servers
table in the<kind>.<name>
formatbucket
: Name of a bucket on said MinIO serverlocation
: Location of externally stored objects in said bucket
Example:
[[cleaning_runs]]
database_server = "my_db_server"
schema = "my_schema"
store = "my_store"
storage_server = "minio.my_minio_server"
bucket = "my_bucket"
location = "my_location"
Usage
The cleaning process can be started like so:
dj-cleaner
The command above will execute all cleaning runs defined in the configuration file. The --config-file
option can be used to pass a custom path to a configuration file to DataJoint-Cleaner:
dj-cleaner --config-file /path/to/config/file
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datajoint-cleaner-0.1.tar.gz
.
File metadata
- Download URL: datajoint-cleaner-0.1.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c30d8611cb081a5117f77deb710fbbd9e0469508f5002a12c9ac01ec26cbbc71 |
|
MD5 | ec7d8e7aa0b059b5fa20bba81d87bbb4 |
|
BLAKE2b-256 | 4cfa6a2a1f4dbeffadee8b717375474fa2eb03a9b834b24bdb74647a823ccc13 |
File details
Details for the file datajoint_cleaner-0.1-py3-none-any.whl
.
File metadata
- Download URL: datajoint_cleaner-0.1-py3-none-any.whl
- Upload date:
- Size: 26.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b1839181331eabd487866df300412042e40685114cb8e72bcb83e38de759fab |
|
MD5 | 2ba3e8eda317ae91aff24fd7dbadf529 |
|
BLAKE2b-256 | 4e7612f25a4c98933885507afc217979945fe502f76d7d5a4c3f35b2123cf5a5 |