geo-to-hca

A tool to assist in the automatic conversion of geo metadata to hca metadata standard

These details have not been verified by PyPI

Project links

Homepage

Project description

geo_to_hca

A tool to assist in the automatic conversion of geo metadata to hca metadata standard.

Installation

pip install geo-to-hca

Description

The tool takes as input a single GEO accession or list of GEO accessions and a template HCA metadata excel spreadsheet. It returns as output a pre-filled HCA metadata spreadsheet for each accession. Each spreadsheet can then be used as an intermediate file for completion by manual curation. Optionally an output log file can also be generated which lists the availability of an SRA study accession and fastq file names for each GEO accession given as input.

Usage

To run it as a package, after installing it via pip:

$ geo-to-hca -h                                                            
usage: geo-to-hca [-h] [--accession ACCESSION]
                  [--accession_list ACCESSION_LIST] [--input_file INPUT_FILE]
                  [--nthreads NTHREADS] [--template TEMPLATE]
                  [--header_row HEADER_ROW] [--input_row1 INPUT_ROW1]
                  [--output_dir OUTPUT_DIR] [--output_log OUTPUT_LOG]

optional arguments:
  -h, --help            show this help message and exit
  --accession ACCESSION
                        accession (str): either GEO or SRA accession
  --accession_list ACCESSION_LIST
                        accession list (comma separated)
  --input_file INPUT_FILE
                        optional path to tab-delimited input .txt file
  --nthreads NTHREADS   number of multiprocessing processes to use
  --template TEMPLATE   path to an HCA spreadsheet template (xlsx)
  --header_row HEADER_ROW
                        header row with HCA programmatic names
  --input_row1 INPUT_ROW1
                        HCA metadata input start row
  --output_dir OUTPUT_DIR
                        path to output directory; if it does not exist, the
                        directory will be created
  --output_log OUTPUT_LOG
                        True/False: should the output result log be created

To run it as a python module:

cd /path-to/geo_to_hca
python -m geo_to_hca.geo_to_hca -h

Basic arguments: 1 of these options is required. No more than 1 option can be given.

Option (1): Get the HCA metadata for 1 GEO accession

Example command:

geo-to-hca --accession GSE97168

Option (2): Get the HCA metadata for a comma-separated list of GEO accessions

Example command:

geo-to-hca --accession_list GSE97168,GSE124872,GSE126030

Option (3): Get the HCA metadata given a file consisting of accessions N.B. should consist of an "accession" column name in the header. For example, an example input file named accessions.txt, should look like

accession
GSE97168
GSE124872
GSE126030

Example command:

geo-to-hca --input_file <path>/accessions.txt

Other optional arguments:

(1)

--template,default="template/hca_template.xlsx"

The default template is an empty HCA metadata spreadsheet in excel format, with the relevant HCA metdata headers in rows 1-5. The default header row with programmatic names is row 4; the default start input row is row 6. It is not necessary to specify this argument unless the HCA spreadsheet format changes.

(2)

--header_row,type=int,default=4

The default header row with programmatic names is row 4. It is not necessary to specify this argument unless the HCA spreadsheet format changes.

(3)

--input_row1,type=int,default=6

The default start input row is row 6. It is not necessary to specify this argument unless the HCA spreadsheet format changes.

(4)

--output_dir,default='spreadsheets/'

An output directory can be specified by it's path. If the path does not already exist, it will be created. If this argument is not given, the default output directory is 'spreadsheets/'

(5)

--output_log,type=bool,default=True

An optional arugment to retrieve an output log file stating whether an SRA study id and fastq file names were available for each GEO accession given as input.

Developer Notes

Requirements

Requirements for this project are listed in 2 files: requirements.txt and requirements-dev.txt. The requirements-dev.txt file contains dependencies specific for development

The requirement files (requirements.txt, requirements-dev.txt) are generated using pip-compile from pip-tools

pip-compile requirements.in
pip-compile requirements-dev.in

The direct dependencies are listed in requirements.in, requirements-dev.in input files.

Install dependencies

by using pip-sync from pip-tools

pip-sync requirements.txt requirements-dev.txt

or by just using pip install

    pip install -r requirements.txt
    pip install -r requirements-dev.txt

Upgrade dependencies

To update all packages, periodically re-run pip-compile --upgrade

To update a specific package to the latest or a specific version use the --upgrade-package or -P flag:

pip-compile --upgrade-package requests

See more options in the pip-compile documentation .

Developing Code in Editable Mode

Using pip's editable mode, projects using geo_to_hca as a dependency can refer to the latest code in this repository directly without installing it through PyPI. This can be done either by manually cloning the code base:

pip install -e path/to/geo_to_hca

or by having pip do it automatically by providing a reference to this repository:

pip install -e \
git+https://github.com/ebi-ait/geo_to_hca.git\
#egg=geo-to-hca

Publish to PyPI

Create PyPI Account through the registration page.

Take note that PyPI requires email addresses to be verified before publishing.
Package the project for distribution.
```
 python setup.py sdist
```
Install Twine
```
 pip install twine        
```
Upload the distribution package to PyPI.
```
 twine upload dist/*
```
Running python setup.py sdist will create a package in the dist directory of the project base directory.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.21

Oct 10, 2022

1.0.20

Oct 3, 2022

1.0.19

Aug 30, 2022

1.0.18

Aug 29, 2022

1.0.17

Aug 29, 2022

1.0.16

Aug 26, 2022

1.0.15

Aug 19, 2022

1.0.14

Aug 19, 2022

1.0.13

Aug 19, 2022

1.0.12

Aug 18, 2022

1.0.11

Aug 18, 2022

1.0.10

Mar 24, 2022

1.0.10rc1 pre-release

Mar 29, 2022

1.0.9

Mar 24, 2022

1.0.9rc1 pre-release

Mar 23, 2022

1.0.8

Mar 18, 2022

1.0.7

Mar 15, 2022

1.0.6

Mar 1, 2022

1.0.5

Jan 23, 2022

1.0.4

Jan 23, 2022

1.0.3

Jan 20, 2022

1.0.2 yanked

Jan 20, 2022

Reason this release was yanked:

broken installation

1.0.1 yanked

Jan 20, 2022

Reason this release was yanked:

broken installation

1.0.0 yanked

Jan 20, 2022

Reason this release was yanked:

broken installation

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geo-to-hca-1.0.21.tar.gz (94.2 kB view details)

Uploaded Oct 10, 2022 Source

File details

Details for the file geo-to-hca-1.0.21.tar.gz.

File metadata

Download URL: geo-to-hca-1.0.21.tar.gz
Upload date: Oct 10, 2022
Size: 94.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for geo-to-hca-1.0.21.tar.gz
Algorithm	Hash digest
SHA256	`1bc80dc0859db68664a59cc3cd514911348e63e21c1396d7a49a5d4860a716d7`
MD5	`6905c9d7cd6c1dcc237cd422e8ed431b`
BLAKE2b-256	`77e6cf9563210ade27284bde9830d8158c26ad18824eb336cc42fb3adb678d88`

See more details on using hashes here.

geo-to-hca 1.0.21

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

geo_to_hca

Installation

Description

Usage

Basic arguments: 1 of these options is required. No more than 1 option can be given.

Other optional arguments:

Developer Notes

Requirements

Install dependencies

Upgrade dependencies

Developing Code in Editable Mode

Publish to PyPI

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes