Skip to main content

A Python tool for fetching metadata for bacterial genomes.

Project description

FetchM: Metadata Fetching and Analysis Tool

Overview

FetchM is a Python-based tool for fetching and analyzing genomic metadata from NCBI BioSample records. When you download ncbi_dataset.tsv from the NCBI genome database, the metadata fields such as 'Collection Date', 'Host', 'Geographic Location', and 'Isolation Source' are missing. This tool helps fetch the associated metadata for each BioSample ID. FetchM requires an input file (ncbi_dataset.tsv) from the NCBI genome database, retrieves additional annotations from NCBI, filters the data based on quality thresholds, and generates visualizations to help interpret the results. You can also download the filtered sequences.

Features

  • Fetch metadata from NCBI BioSample API.
  • Filter genomes based on CheckM completeness and ANI check status.
  • Generate metadata summaries and annotation statistics.
  • Create various visualizations for geographic distribution, collection dates, and gene counts.
  • Download genome sequences (optional).

Installation

Using Conda

You can install FetchM in a Conda environment:

conda create -n fetchM_env python=3.8
conda activate fetchM_env
conda create -n fetchM_env -c conda-forge python=3.8 pandas requests xmltodict matplotlib seaborn scipy tqdm

Using pip

Ensure you have Python 3 installed. Install dependencies with:

pip install -r requirements.txt

Usage

Run FetchM with the following command:

fetchM --input input.tsv --outdir results/

Additional Options:

  • --checkm 95 (Set CheckM completeness threshold, default: 95)
  • --seq (Enable sequence download mode)

Output

FetchM creates multiple output files inside the results/ directory:

  • Metadata summaries in metadata_output/
  • Figures in figures/
  • Filtered datasets for further analysis

Visualizations

Annotation Distributions

Annotation Count Gene Protein-coding Annotation Count Gene Pseudogene Annotation Count Gene Total

Assembly Statistics

Assembly Sequence Length

Metadata Summaries

Collection Date Distribution Geographic Location Distribution Host Distribution

Scatter Plots

Gene Protein Coding vs Collection Date Gene Total vs Collection Date Sequence Length vs Collection Date

License

This project is licensed under the MIT License.

Author

Developed by Tasnimul Arabi Anik.

Contributions

Contributions and improvements are welcome! Feel free to submit a pull request or report issues.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fetchm-0.1.0.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fetchm-0.1.0-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file fetchm-0.1.0.tar.gz.

File metadata

  • Download URL: fetchm-0.1.0.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for fetchm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 90258cfd7f4c5585f6907415ac2df03569a34e1e169bf945d8199e3ec46d6a3e
MD5 8fef6d0c34f34d54ffb0c17229c65a18
BLAKE2b-256 b45a7bde8f5320f9e2e4bd57024fd6c32d4e7e49f6dd62c53fe3fa9f44a76bba

See more details on using hashes here.

File details

Details for the file fetchm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fetchm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for fetchm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 14a830280ceb92a1096e613f8c9ebb474e0483f6c9bc82cad6643e4d36b7ddd0
MD5 0df6990333d2f341274c642089ba74fe
BLAKE2b-256 ffddd900782e1f454d4a10046a5df3d382733aabf48cf0e8a3598df9378ee20f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page