Pre-processing data tool for NHP Lab @ CMU
Project description
nhp-prep
This is a CLI Tool that has been created to pre-process historical data that has been collected in multiple instances. This includes data collected at Seneca Zoo and Mellon Institute.
Requirements
This package requires Python 3.
Installing
To install this CLI tool you can run the below command
pip3 install nhp-prep
Updating
If you already have this tool installed, you can update it to the latest stable release by using the following command:
pip3 install -U nhp-prep
Alternatively, you clone this repo and then run this command from within the repository folder
python3 setup.py install
Another way to install this solution is by running the following command from within the repository folder:
pip install -e .
Both the above commands would install the package globally and nhp-prep
will be available on your system.
How to use
There are multiple instances in which you can use this tool.
nhp-prep COMMAND [OPTIONS]
There are four use-cases (commands) in which you can use this tool:
- Mapping columns from prior to current format (
reorder-columns
)
nhp-prep reorder-columns -i <directory_with_files_to_reorder_columns_OR_unique_CSV_file> -o <output_directory> -r <file_with_reference_columns>
- Rename the files to follow current standard (
rename
)
nhp-prep rename -i <directory_files_to_rename> -o <output_directory>
The current format for the file is: YYYY-MM-DD_HHmmh_<experiment_name>_<Subject_name>_<Researcher_name_or_initials>_data.csv
- Timestamp estimation trials from historical data files based on column (
timestamp-estimate
)
nhp-prep timestamp-estimate -i <directory_with_files_OR_unique_CSV_file> -o <output_directory>
Since v0.3.0
Since the previous 3 steps are common across the different datasets collected, the dev team decided to merge them into one single command (preparation-steps
):
nhp-prep preparation-steps -i <input_directory> -o <output_directory>
The previous command will run sequentially the steps 1 to 3. The only command left outside of the bundle is the #4 since that is only applicable for the Baboons' data and requires the additional reference file.
- Renaming of Subject according to logs file (needs the file) (
sub-rename
)
nhp-prep sub-rename -r <file_with_columns_and_reference_subject_names> -i <directory_with_files_OR_unique_CSV_file> -o <output_directory>
- Merge multiple CSV files into a single file. The input should be a directory and so is the output
nhp-prep merge-csv -i <directory_with_files_OR_unique_CSV_file> -o <output_directory>
- You can perform a data cleaning process by using the merged-csv file from step #5 based on hardcoded rules, such as the name of the Experiment, the Date or the Researcher name as well.
nhp-prep data-cleaning -i <merged_csv_file> -o <output_directory>
Using the help sub-command
You could also run nhp-prep --help
to see the available commands and their corresponding usage.
If you want to know all the options available for an specific command, run the following:
nhp-prep COMMAND --help
Example:
nhp-prep sub-rename --help
Feedback
Please feel free to leave feedback in issues/PRs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nhp-prep-1.0.2.tar.gz
.
File metadata
- Download URL: nhp-prep-1.0.2.tar.gz
- Upload date:
- Size: 19.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 843d007406bbe3583e7a4bab33761c1d822669f158d513f69baec1aa5ac61642 |
|
MD5 | 3d5e3c598359755801f1c7fc7e9c650d |
|
BLAKE2b-256 | af66aaa293a8c511cb6bb594da191d798934c3f69fd6bd6c4b3ba88e01721ffa |
File details
Details for the file nhp_prep-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: nhp_prep-1.0.2-py3-none-any.whl
- Upload date:
- Size: 21.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55c800313b5ac9d07f9c3106aeecb5d9b917e15926c783b19ad1bc73ce2287a8 |
|
MD5 | ac057cc31e296f0e5c80a20fcd1c08ea |
|
BLAKE2b-256 | 58f48b6424ac4b186c59af4541e65fe5117e0383be8fdeb5497f31f2fdabcd15 |