A solution for research data management
Project description
The CADET-Research Data Management toolbox
Getting started
Installation
CADET-RDM can be installed using
pip install cadetrdm
Initialize Project Repository
Create a new project repository or convert an existing repository into a CADET-RDM repo:
cadet-rdm initialize-repo <path-to-repo> <output-folder-name>
or from python
from cadetrdm import initialize_repo
initialize_repo(path_to_repo, output_folder_name)
The output_folder_name
can be given optionally. It defaults to output
.
Use CADET-RDM in Python
Tracking Results
from cadetrdm import ProjectRepo
"""
Your imports and function declarations
e.g. generate_data(), write_data_to_file(), analyse_data() and plot_analysis_results()
"""
if __name__ == '__main__':
# Instantiate CADET-RDM ProjectRepo handler
repo = ProjectRepo()
# If you've made changes to the code, commit the changes
repo.commit("Add code to generate and analyse example data")
# Everything written to the output_folder within this context manager gets tracked
# The method repo.output_data() generates full paths to within your output_folder
with repo.track_results(results_commit_message="Generate and analyse example data"):
data = generate_data()
output_filepath = repo.output_data(sub_path="raw_data/data.csv")
write_data_to_file(data, output_filepath)
analysis_results = analyse_data(data)
figure_path=repo.output_data("analysis/regression.png")
plot_analysis_results(analysis_results, figure_path)
Sharing Results
To share your project code and results with others, you need to create remote repositories on e.g. GitHub or GitLab. You need to create a remote for both the project repo and the results repo.
Once created, the remotes need to be added to the local repositories.
cadet-cli add-remote-to-repo <path_to_repo> git@<my_git_server.foo>:<project>.git
cadet-cli add-remote-to-repo <path_to_repo/output_folder> git@<my_git_server.foo>:<project>_output.git
or in Python:
repo = ProjectRepo()
repo.add_remote("git@<my_git_server.foo>:<project>.git")
repo.output_repo.add_remote("git@<my_git_server.foo>:<project>_output.git")
Once remotes are configured, you can push all changes to the project repo and the results repos with the command
# push all changes to the Project and Output repositories with one command:
repo.push()
Re-using results from previous iterations
Each result stored with CADET-RDM is given a unique branch name, formatted as:
<timestamp>_<output_folder>_"from"_<active_project_branch>_<project_repo_hash[:7]>
With this branch name, previously generated data can be loaded in as input data for further calculations.
cached_array_path = repo.input_data(branch_name=branch_name, file_path="raw_data/data.csv")
Alternatively, using the auto-generated cache of previous results, CADET-RDM can infer the correct branch name from the path to the file within the cache
cached_array_path = repo.input_data(file_path="output_cached/<branch_name>/raw_data/data.csv")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for CADET_RDM-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3a2e3666a89dcede2fe38f1624eb91bcffb5d77fbef1c0f2517eeae6845eb1d |
|
MD5 | 71a4ff00a35aa0b0de8e472b61e09b87 |
|
BLAKE2b-256 | c6a064ea1a40f82fa944a19ae3a3ee23cfa4f7ef6acdf1f72d75b0b7a05c2d41 |