Methods to help track the scripts and datafiles in a project.
Project description
Datatracker
Datatracker is a basic logging Python package that keeps track of files and code within a Project. Each script is logged as an entry and input and output datafiles are recorded in order. Datatracker is able to manage versioning of both files and scripts, and is able to identify the most up-to-date version.
At the moment, this Python package is still in alpha, and I may include changes to both UI and file format that may be breaking.
Installation
To install, run the following command:
pip install git+ssh://git@github.com/TarjinderSingh/datatracker
Usage
New entries
For an entry,
tagis a unique identifier to the script in question and should be clear what the general purpose and output of the script is. (ie Merge is not what we want to see here)descriptionneeds to be one or two sentences equivalent of the Git commit message that thoroughly describes the general purpose and output of the script.categoryindicates the general step of analysis the script belongs to.moduleis the sub-category for which the script belongs to. Typecategory_templatein interactive Python for an idea of the appropriate categories and modules are.
For a InputFile or OutputFile,
tagis a unique identifier to the File in question and should be clear what the general purpose and output of the script is. (ie Merge is not what we want to see here).descriptionfor a file is a one or two sentences equivalent of the Git commit message that thoroughly describe the general purposes of the File at hand.
from datatracker import *
tr = Tracker()
os.environ['VERSION'] = '0.1.0'
entry = Entry(tag='filter-common-variants',
description='Filtering common variants in new GWAS data set.',
category='Processing',
module='Variant QC')
infile = entry.add(
InputFile(tag='raw-plink-file',
path='gs://bucket/raw-plink-file.bed',
description='Raw PLINK file.'))
outfile = entry.add(
OutputFile(tag='filt-plink-file',
path='gs://bucket/raw-plink-file.bed',
description='Filtered PLINK file.'))
tr.save(entry)
View existing entries
from datatracker import *
tr = Tracker()
tr.table
Use existing entries for pipeline
infile = entry.add(InputFile(entry_tag='filter-common-variants', tag='raw-plink-file', database=tr))
Filter and remove
# filter to entry
tr.filter(tr.entry.tag_version == 'import-array_0.1.6')
# remove entry
tr.remove(tr.entry.tag_version == 'import-array_0.1.6')
Pandas and Excel
df = tr.explode()
df = tr.explode('filt-plink-file')
df = tr.to_pandas()
df = tr.table
df.to_excel('spreadsheet.xlsx')
Data artifacts
infile = entry.add(InputFile(path='gs://checkpoint-cache/tmp/1.bed'))
License
MIT License (see repository)
Maintainer
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datatracker-0.2.5.tar.gz.
File metadata
- Download URL: datatracker-0.2.5.tar.gz
- Upload date:
- Size: 27.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/26.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.42.1 importlib-metadata/4.11.2 keyring/21.4.0 rfc3986/1.4.0 colorama/0.4.3 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cfe6042bffe0342d7ce4bbe5ba1448f7ae7f186bfc9452b9c0f854f02683463
|
|
| MD5 |
ada5d6c17838a3c2a33766bb62e5039a
|
|
| BLAKE2b-256 |
7809f05f1ccfaac03a7bea0436a1752e787096d662600eeeca8c72175552e489
|
File details
Details for the file datatracker-0.2.5-py3-none-any.whl.
File metadata
- Download URL: datatracker-0.2.5-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/26.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.42.1 importlib-metadata/4.11.2 keyring/21.4.0 rfc3986/1.4.0 colorama/0.4.3 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2685b3174be894c30b5c34ec7ec7f23caa22e99bdec0493789fafdf81fb9c889
|
|
| MD5 |
1407c7bac89486b4033b78695315aacd
|
|
| BLAKE2b-256 |
59e15d8fe99c9b6a4e4f44e51f7b3cfc1bb6ca41224ccb62e9dd00baf1dce4a9
|