Collect information from NCBI for the https://github.com/HelikarLab/FastqToGeneCounts project
Project description
GEO Collector
Description
GEOcollector is a Python package for collecting metadata about gene expression datasets from the NCBI Gene Expression Omnibus (GEO) database. It will convert a list of GSM accession numbers and cell types into the information required for FastqToGeneCounts to process the raw RNA-seq data.
Long story short, given an input file like this:
GSM,cell_type
GSM3785334,baso
GSM3898581,baso
GEOcollector will output a file like this (without formatted columns):
GSE ,GSM ,SRR ,Rename ,Strand ,Prep Method ,Platform Code ,Platform Name ,Source ,Cell Characteristics ,Replicate Name ,Strategy ,Publication ,Extra Notes
GSE131525 ,GSM3785334 ,SRR9097791 ,baso_S1R1 ,SE ,total ,GPL16791 ,Illumina HiSeq 2500 (Homo sapiens) ,B ,subject - disease status: Screened Healthy Control;subject: HC3;age at draw: 55;Sex: Female;median cv coverage: 0.763618;fastq total reads: 6241803;unpaired reads examined: 5663490;unpaired read duplicates: 1597507;primary race: White; ,lib3945 ,RNA-Seq ,31671072 ,
GSE133028 ,GSM3898581 ,SRR9328889 ,baso_S2R1 ,PE ,total ,GPL20301 ,Illumina HiSeq 4000 (Homo sapiens) ,peripheral blood ,cell type: peripheral blood B cells; ,Patient 2 IgD-CD27- double negative B cells from the peripheral blood ,RNA-Seq ,32859762 ,
GSE133028 ,GSM3898591 ,SRR9328899 ,baso_S2R2 ,PE ,total ,GPL20301 ,Illumina HiSeq 4000 (Homo sapiens) ,peripheral blood ,cell type: peripheral blood B cells; ,Patient 3 IgD-CD27- double negative B cells from the peripheral blood ,RNA-Seq ,32859762 ,
Installation
To install GEOcollector, you can use pip:
pip install GEOcollector
Usage
The following sections are command line parameters associated with GEOcollector
Command Line Interface
To execute GEOcollector, simply call it from the command line with the relevant parameters
geocollector --api-key APIKEY --input-file /home/user/input.csv --verbose
geocollector --input-file /home/user/input.csv --quiet
geocollector --api-key APIKEY --input-file /home/user/input.csv
To view help for GEOcollector, run the following command
geocollector --help
API Key
Without an API key, NCBI limits the number of requests to 3 per second. With an API key, this value is increased to 10 requests per second. To obtain an API key, follow the below steps
- Access NCBI's website
- Click "Log In" in the top right corner
- If you do not have an account, create one now
- Click your username in the top right corner
- Click "Account settings" in the dropdown menu
- Scroll down to the "API Key" section
- Click "Create API Key"
- Copy the API key that has been created
Input file
The input file should be a CSV file in the following format. Multiple GSMs can be associated to a single cell type
GSM,cell_type
GSM_1,cell_type_1
GSM_2,cell_type_1
GSM_3,cell_type_1
GSM_4,cell_type_2
Verbosity
If you would like to show debug information on the command line, pass the flag --verbose
. If you would like to silence
all output (except warnings), pass the flag --quiet
. If neither flag is passed, standard "info" messages will be shown
If you have problems, please create a new issue
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file geocollector-1.1.3.tar.gz
.
File metadata
- Download URL: geocollector-1.1.3.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.11.0 Linux/6.2.0-1012-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 354ff8ca456853f51238baa4724f9fdc03fb08107c68f3b593a842ec4d4e0269 |
|
MD5 | c941bd2e1ce141f05a408ed828ad1ddb |
|
BLAKE2b-256 | 76f1b1d3e4e55675cf41a264454520d229a49aad856e9e33461db856242c1c67 |
File details
Details for the file geocollector-1.1.3-py3-none-any.whl
.
File metadata
- Download URL: geocollector-1.1.3-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.11.0 Linux/6.2.0-1012-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0596ee0c7007209deba9c5a0efc7f561e415c789984ba9c68e6e12aba413daf |
|
MD5 | 827d4e2a23a4d655594f9a3f5979fb5a |
|
BLAKE2b-256 | 96eb5e8a4568adb8c37886b8de0643e7b8423641e832be5054f4fb1f299f8f5c |