A tool to assist in the automatic conversion of hca metadata to scea metadata MAGE-TAB files.
Project description
hca2scea
A tool to assist in the automatic conversion of an hca metadata spreadsheet to scea metadata MAGE-TAB files.
Installation
pip install hca2scea
Description
The tool takes as input an HCA metadata spreadsheet and converts the metadata to SCEA MAGE-TAB files which are then saved to an output directory.
Usage
To run it as a package, after installing it via pip:
$ hca2scea -h
usage: hca2scea [-h] -s SPREADSHEET -id PROJECT_UUID -study STUDY [-name {cs_name,cs_id,sp_name,sp_id,other}] -ac
ACCESSION_NUMBER -c CURATORS [CURATORS ...] -et {baseline,differential} [-facs] -f EXPERIMENTAL_FACTORS
[EXPERIMENTAL_FACTORS ...] -pd PUBLIC_RELEASE_DATE -hd HCA_UPDATE_DATE [-o OUTPUT_DIR] [-zip]
run hca -> scea tool
optional arguments:
-h, --help show this help message and exit
-s SPREADSHEET, --spreadsheet SPREADSHEET
Please provide a path to the HCA project spreadsheet.
-id PROJECT_UUID, --project_uuid PROJECT_UUID
Please provide an HCA ingest project submission id.
-study STUDY Please provide the SRA or ENA study accession.
-name {cs_name,cs_id,sp_name,sp_id,other}
Please indicate which field to use as the sample name. cs=cell suspension, sp = specimen.
-ac ACCESSION_NUMBER, --accession_number ACCESSION_NUMBER
Provide an E-HCAD accession number. Please find the next suitable accession number by checking
the google tracker sheet.
-c CURATORS [CURATORS ...], --curators CURATORS [CURATORS ...]
space separated names of curators
-et {baseline,differential}, --experiment_type {baseline,differential}
Please indicate whether this is a baseline or differential experimental design
-facs Please specify if FACS was used to isolate single cells.
-f EXPERIMENTAL_FACTORS [EXPERIMENTAL_FACTORS ...], --experimental_factors EXPERIMENTAL_FACTORS [EXPERIMENTAL_FACTORS ...]
space separated list of experimental factors
-pd PUBLIC_RELEASE_DATE, --public_release_date PUBLIC_RELEASE_DATE
Please enter the public release date in this format: YYYY-MM-DD
-hd HCA_UPDATE_DATE, --hca_update_date HCA_UPDATE_DATE
Please enter the last time the HCA prohect submission was updated in this format: YYYY-MM-DD
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Provide full path to preferred output dir
-zip, --zip_format Please indicate whether you would like the script to output alltxt files separately or together
in 1 zip file.```
To run it as a python module:
```shell script
cd /path-to/hca2scea
python -m hca-to-scea-tools.hca_to_scea.hca2scea -h
Arguments
Argument | Argument name | Description | Required? |
---|---|---|---|
-s | HCA spreadsheet | Path to HCA spreadsheet (.xlsx) | yes |
-id | HCA project uuid | This is added to the 'secondary accessions' field in idf file | yes |
-c | Curator initials | HCA Curator initials. Space-separated list. | yes |
-ac | accession number | Provide an SCEA accession number (integer). | yes |
-et | Experiment type | Must be 1 of [differential,baseline] | yes |
-f | Factor value | A space-separated list of user-defined factor values e.g. age disease | yes |
-pd | Dataset publication date | provide in YYYY-MM-DD E.g. from GEO | yes |
-hd | HCA last update date | provide in YYYY-MM-DD The last time the HCA project was updated in ingest UI (production) | yes |
-study | study accession (SRPxxx) | The study accession will be used to find the paths to the fastq files for the given runs | yes |
-name | HCA name field | Which HCA field to use for the biomaterial names columns. Must be 1 of | no |
[cs_name, cs_id, sp_name, sp_id, other] where cs indicates cell suspension and sp indicates | |||
specimen from organism. Default is cs_name. | |||
-facs | optional argument | If FACS was used as a single cell isolation method, indicate this by adding the -facs argument. | no |
-o | optional argument | An output dir path can optionally be provided. If it does not exist, it will be created. | no |
-zip | optional argument | Indicate if you would like the resulting output files to be output in a single zip file. | no |
Definitions
Factor values
A factor value is a chosen experimental characteristic which can be used to group or differentiate samples. Multiple factor values can be entered and should be chosen from the following list.
- Known disease(s)
- Development stage
- Organ
- Organ part
- Selected cell type(s)
- Individual
There must be at least 1 factor value. If you cannot identify a factor value i.e. all donors and samples share the same metadata with respect to the above list of factor values, then enter 'Individual'.
Datasets with more than 1 technolgoy type are not eligible for SCEA. Therefore, technology type is not a valid factor value.
Experiment type
An experiment with samples which can be grouped or differentiatied by a factor value is classified as 'differential'. The list of possible factor values can be found above.
If 1 or more factor values other than 'Individual' is identified, then the experiment type should be 'Differential'. If the only factor value is 'Individual', then the experiment type should be 'Baseline'.
Related E-HCAD-ID
If the project has been split into two separate E-HCAD datasets, due to different technologies being used in the same project, or any other reason, then enter the E-HCAD-ID for the other dataset here. Example: E-HCAD-50.
Examples
Required arguments only
python3 hca2scea.py -s /home/aday/GSE111976-endometrium_MC_SCEA.xlsx -id 379ed69e-be05-48bc-af5e-a7fc589709bf -study SRP135922 -ac 50 -c AD -et differential -f menstrual cycle day -pd 2021-06-29 -hd 2021-02-12
Specify optional name argument
python3 hca2scea.py -s /home/aday/GSE111976-endometrium_MC_SCEA.xlsx -id 379ed69e-be05-48bc-af5e-a7fc589709bf -study SRP135922 -name cs_name -ac 50 -c AD -et differential -f menstrual cycle day -pd 2021-06-29 -hd 2021-02-12
Specify that FACS was used
python3 hca2scea.py -s /home/aday/GSE111976-endometrium_MC_SCEA.xlsx -id 379ed69e-be05-48bc-af5e-a7fc589709bf -study SRP135922 -ac 50 -c AD -et differential -f menstrual cycle day -pd 2021-06-29 -hd 2021-02-12 -facs
Specify optional output dir
python3 hca2scea.py -s /home/aday/GSE111976-endometrium_MC_SCEA.xlsx -id 379ed69e-be05-48bc-af5e-a7fc589709bf -study SRP135922 -ac 50 -c AD -et differential -f menstrual cycle day -pd 2021-06-29 -hd 2021-02-12 -o my_output_dir
Developer Notes
Developing Code in Editable Mode
Using pip
's editable mode, projects using hca-to-scea as a dependency can refer to the latest code in this repository
directly without installing it through PyPI. This can be done either by manually cloning the code
base:
pip install -e path/to/hca2scea
or by having pip
do it automatically by providing a reference to this repository:
pip install -e \
git+https://github.com/ebi-ait/hca-to-scea-tools.git\
#egg=hca2scea
Publish to PyPI
-
Create PyPI Account through the registration page.
Take note that PyPI requires email addresses to be verified before publishing.
-
Package the project for distribution.
python setup.py sdist
-
Install Twine
pip install twine
-
Upload the distribution package to PyPI.
twine upload dist/*
Running
python setup.py sdist
will create a package in thedist
directory of the project base directory.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hca2scea-0.1.1.tar.gz
.
File metadata
- Download URL: hca2scea-0.1.1.tar.gz
- Upload date:
- Size: 20.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.3 tqdm/4.55.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a1fa066253538a96b7c8e1174e228f4aeeec4ea4bdbd7122b1ba9878e1cd5da |
|
MD5 | c3248b1131de388b06a74023d7ffadad |
|
BLAKE2b-256 | 5b6c5e8c72b25257b45f48372fc4bafc1b3e1651870ce71f100ea70b83b9cb6d |
File details
Details for the file hca2scea-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: hca2scea-0.1.1-py3-none-any.whl
- Upload date:
- Size: 23.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.3 tqdm/4.55.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84b3e77d55101b80aa7af9ca895288b3a5e7c73a4e1edb38a1119067fd7e15b6 |
|
MD5 | 63b3c089de1d36e67357380dceab82c2 |
|
BLAKE2b-256 | 0200bf6f81b3849c23ba608555c712001bfc27ed4218d8bd96efedfd11dd180a |