Gene Fetch: High-throughput NCBI Sequence Retrieval Tool

These details have not been verified by PyPI

Project links

Project description

Gene Fetch

Gene Fetch enables high-throughput retreival of sequence data from NCBI databases based on taxonomy IDs (taxids) or taxonomic heirarchies. It can retrieve both protein and/or nucleotide sequences for various genes, including protein-coding genes (e.g., cox1, cytb, rbcl, matk) and rRNA genes (e.g., 16S, 18S).

Installation:

Install from PyPI

pip install gene-fetch

Post-instllation testing:

The Gene Fetch package includes comprehensive tests - Testing is divided into basic tests (which don't require external API access) and integration tests (which may require NCBI API credentials). Install pytest:

pip install pytest

Run basic tests:

pytest

This will take a few minutes to run 65 tests, consisting of 8 test modules (tests/test_*.py). You will get 1 warning regarding API credentials as these are not provided in the basic tests.

Usage:

python gene_fetch.py -g/--gene <gene_name> --type <sequence_type> -i/--in <samples.csv> -o/--out <output_directory>

--h/--help: Show help and exit.

Required arguments:

-g/--gene: Name of gene to search for in NCBI GenBank database (e.g., cox1/16s/rbcl). --type: Sequence type to fetch; 'protein', 'nucleotide', or 'both' ('both' will initially search and fetch a protein sequence, and then fetches the corresponding nucleotide CDS for that protein sequence). -i/--in: Path to input CSV file containing sample IDs and TaxIDs (see Input section below). i2/--in2: Path to alternative input CSV file containing sample IDs and taxonomic information for each sample (see Input section below). o/--out: Path to output directory. The directory will be created if it does not exist. e/--email and -k/--api-key: Email address and associated API key for NCBI account. An NCBI account is required to run this tool (due to otherwise strict API limitations) - information on how to create an NCBI account and find your API key can be found here.

Optional arguments:

--protein-size: Minimum protein sequence length filter. Applicable to mode 'batch' and 'single' search modes (default: 500). --nucleotide-size: Minimum nucleotide sequence length filter. Applicable to mode 'batch' and 'single' search modes (default: 1500). s/--single: Taxonomic ID for 'single' sequence search mode (-i and -i2 are ignored when run with -s mode). 'single' mode will fetch all (or N if specifying --max-sequences) target gene or protein sequences on GenBank for a specific taxonomic ID. --max-sequences: Maximum number of sequences to fetch for a specific taxonomic ID (only applies when run in 'single' mode). -b/--genbank: Saves genbank (.gb) files for fetched nucleotide and/or protein sequences to genbank/ (applies when run in 'batch' or 'single' mode).

Input:

Example 'samples.csv' input file (-i/--in)

ID	taxid
sample-1	177658
sample-2	177627
sample-3	3084599

Example 'samples_taxonomy.csv' input file (-i2/--in2)

ID	phylum	class	order	family	genus	species
sample-1	Arthropoda	Insecta	Diptera	Acroceridae	Astomella
sample-2	Arthropoda	Insecta	Hemiptera	Cicadellidae	Psammotettix	Psammotettix sabulicola
sample-3	Arthropoda	Insecta	Trichoptera	Limnephilidae	Dicosmoecus	Dicosmoecus palatus

Leave blank if taxonomic information not known/needed

Authored by Dan Parsons and Ben Price @ NHMUK (2025).

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.21

Dec 12, 2025

1.0.20

Dec 5, 2025

1.0.19

Nov 19, 2025

1.0.18

Nov 4, 2025

1.0.17

Sep 23, 2025

1.0.16

Sep 23, 2025

1.0.15

Aug 6, 2025

1.0.14

Jul 28, 2025

1.0.13

Jul 8, 2025

1.0.12

Jul 3, 2025

1.0.11

May 12, 2025

1.0.9

May 6, 2025

1.0.8

May 6, 2025

1.0.7

May 6, 2025

This version

1.0.6

May 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gene_fetch-1.0.6.tar.gz (53.9 kB view details)

Uploaded May 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gene_fetch-1.0.6-py3-none-any.whl (35.5 kB view details)

Uploaded May 6, 2025 Python 3

File details

Details for the file gene_fetch-1.0.6.tar.gz.

File metadata

Download URL: gene_fetch-1.0.6.tar.gz
Upload date: May 6, 2025
Size: 53.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.12.8 Linux/6.1.0-31-amd64

File hashes

Hashes for gene_fetch-1.0.6.tar.gz
Algorithm	Hash digest
SHA256	`1711025133661c17eb4db5f80eaab1b97f6281536889c33d8a5aee988b778d10`
MD5	`8f560d8f054b21560bc18b8589f6f98b`
BLAKE2b-256	`882c5ab13c5b2e4e04af69b5b507ca70c3c9f3942970be19e245d5ce8ecf94fc`

See more details on using hashes here.

File details

Details for the file gene_fetch-1.0.6-py3-none-any.whl.

File metadata

Download URL: gene_fetch-1.0.6-py3-none-any.whl
Upload date: May 6, 2025
Size: 35.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.12.8 Linux/6.1.0-31-amd64

File hashes

Hashes for gene_fetch-1.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3a288b1516f597d3814c4ed65c017709eb338b9b66b400101440f47fb55b583f`
MD5	`64f99f96078732b63fe3df1f45c137e2`
BLAKE2b-256	`ef3b4c5331a6eaca9ff389cbe0ffdc67ed13612cc14019e118b142e9e07b07a4`

See more details on using hashes here.

gene-fetch 1.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Gene Fetch

Installation:

Post-instllation testing:

Usage:

Required arguments:

Optional arguments:

Input:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes