Skip to main content

Query Ensembl for genes using free form search words (gget search) or fetch FTP download links by species (gget FetchTP).

Project description

gget

pypi version pypi downloads license

Query Ensembl for genes using free form search words (gget search) or fetch FTP download links by species (gget FetchTP).

Installation

pip install gget

For use in Jupyter Lab:

from gget import gget, fetchtp

gget FetchTP usage

Fetch GTF, DNA, and cDNA FTP links for a specific species

Jupyter Lab:

fetchtp("genus_species")

Terminal:

gget fetchtp -sp genus_species

where genus_species defines the species for which the FTPs are fetched, e.g. homo_sapiens.

This returns a json with the GTF, DNA, and cDNA links, their respective release dates and time, and the Ensembl release from which the links were fetched in the format:

{
            species: {
                "transcriptome": {
                    "ftp": cDNA FTP download URL,
                    "ensembl_release": Ensembl release #,
                    "release_date": Day-Month-Year,
                    "release_time": HH:MM,
                    "bytes": cDNA FTP file size in bytes
                },
                "genome": {
                    "ftp": DNA FTP download URL,
                    "ensembl_release": Ensembl release #,
                    "release_date": Day-Month-Year,
                    "release_time": HH:MM,
                    "bytes": DNA FTP file size in bytes
                },
                "annotation": {
                    "ftp": GTF FTP download URL,
                    "ensembl_release": Ensembl release #,
                    "release_date": Day-Month-Year,
                    "release_time": HH:MM,
                    "bytes": GTF FTP file size in bytes
                }
            }
        }

Fetch GTF, DNA, and cDNA FTP links for a specific species from a specific Ensembl release, e.g. release 104

Jupyter Lab:

fetchtp("genus_species", release=104)

Terminal:

gget fetchtp -sp genus_species -r 104

where the parameter release / -r defines the Ensembl release from which the FTPs are fetched. By default, the latest release is used.

Fetch only the GTF link for a specific species

Jupyter Lab:

fetchtp("genus_species", return_val="gtf")

Terminal:

gget fetchtp -sp genus_species -rv gtf

where return_val="gtf" / -rv gtf alters the return value from the default json such that only the annotation (GTF) download link for the defined species is returned. Alternative entries for return_val / -rv are dna or cdna, which return only the genome (DNA) or the transcriptome (cDNA) download links, respectively.

This functionality can be combined with single-cell RNA-seq data pre-processing tools such as kallisto bustools or cellranger to build a transcriptome index by automatically fetching the latest FTP links from Ensembl:

# kb ref
kb ref \
-i INDEX \
-g T2G \
-f1 FASTA \
$(gget fetchtp -sp homo_sapiens -rv dna) \
$(gget fetchtp -sp homo_sapiens -rv gtf)

# cellranger mkref
cellranger mkref \
--genome=output_genome \
--fasta=$(gget fetchtp -sp genus_species -rv dna)
--genes=$(gget fetchtp -sp genus_species -rv gtf)

gget search usage

:warning: gget search currently only supports genes listed in the Ensembl core API, which includes limited external references. Searching the Ensembl website might yield more results.

Query Ensembl for genes from a specific species using multiple searchwords

Jupyter Lab:

gget(["searchword1", "searchword2", "searchword3"], "genus_species")

Terminal:

gget search -sw searchword1 searchword2 searchword3 -sp genus_species

Query Ensembl for genes from a specific species using a single searchword

Jupyter Lab:

gget("searchword1", "genus_species")

Terminal:

gget search -sw searchword1 -sp genus_species

Query Ensembl for genes from a specific species using multiple searchwords while limiting the number of returned search results

For example, limiting the number of results to 10: Jupyter Lab:

gget(["searchword1", searchword2, searchword3"], "genus_species", limit=10)

Terminal:

gget search -sw searchword1 searchword2 searchword3 -sp genus_species -l 10

Query Ensembl for genes from any of the 236 species databases found here.

For example, for the database "nothobranchius_furzeri_core_105_2": Jupyter Lab:

gget("searchword1", "nothobranchius_furzeri_core_105_2")

Terminal:

gget search -sw searchword1 -sp nothobranchius_furzeri_core_105_2 

Note:
gget search supports the following species abbreviations:
"homo_sapiens" -> "human"
"mus_musculus" -> "mouse"
"taeniopygia_guttata" -> "zebra finch"
"caenorhabditis_elegans" -> "roundworm"
All other species have to be called using their specific database, as shown in the example above.

Examples

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gget-0.0.4.tar.gz (7.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page