Skip to main content

Dowload FASTQ files from GEO-NCBI with ease.

Project description

Please note that geoDL is in beta version, therefore expect bugs

geoDL/logo.png

geoDL is a python program to download FASTQ files from GEO-NCBI. The program inputs a #GEO access number and perform a search on the EMBL-EBI/ENA website to gather metadata and download FASTQ files. The metadata are used to rename the samples with the experiment sample names (rather than the SRR numbers).

Dependencies

  • geoDL should work with both Python3 and Python2 but test have to be run still

  • Beautifulsoup4, colorama and six python package are required

  • wget is used internally and thus is a dependency of geoDL

Install

On linux and MacOSx

$ pip install --user geoDL

Note it is possible that the flag –pre is needed for installing the beta version.

Usage

  usage: geoDL.py [-h] [--dry] [--samples [SAMPLES [SAMPLES ...]]] [--colname COLNAME]
                  {geo,meta,ena} GSE|metadata|ENA

{geo,meta,ena}        Specify which type of input
GSE|metadata|ENA      geo:  GSE accession number, eg: GSE13373
                            Map the GSE accession to the ENA study accession and fetch the metadata

                      meta: Use metadata file instead of fetching it on ENA website (bypass GEO)
                            Meta data should include at minima the following columns: ['Fastq files
                            (ftp)', 'Submitter's sample name']

                      ena:  ENA study accession number, eg: PRJEB13373
                            Fetch the metadata directely on the ENA website

  optional arguments:
    -h, --help            show this help message and exit
    --dry                 Don't actually download anything, just print the wget
                          cmds
    --samples [SAMPLES [SAMPLES ...]]
                          Space separated list of GSM samples to download. For
                          ENA mode, subset the metadata
    --colname COLNAME     Name of the column to use in the metadata file to name
                          the samples

Example

Download metadata and all the samples of the serie GSE13373 and rename them to their sample names:

$ geoDL geo GSE13373

Download only some samples:

$ geoDL GSE13373 -s GSM00001 GSM00003

Download use a pre downloaded metadata and use column run_alias as name for the samples:

$ geoDL meta my_metadata.txt --column run_alias

Use a ENA code instead of a GSE code:

$ geoDL ena PRJEB13373

Beta test

  • Test python2 support

  • Test handling of wget

Changelog

changelog

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geoDL-1.0b21.tar.gz (9.0 kB view details)

Uploaded Source

File details

Details for the file geoDL-1.0b21.tar.gz.

File metadata

  • Download URL: geoDL-1.0b21.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.2

File hashes

Hashes for geoDL-1.0b21.tar.gz
Algorithm Hash digest
SHA256 47859747a56788b760d3e439a0fce723722744e94a8a2a50dbc261d3d95f5adf
MD5 b09cd1fb55c170827f67eb96e7d4015f
BLAKE2b-256 cd7ceeed55e7fd8f3890ae08d85207f6866d971e64661a5ca75c11c06784e14a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page