A Python tool for fetching metadata for bacterial genomes.
Project description
FetchM: Metadata Fetching and Analysis Tool
Overview
FetchM is a Python-based tool for fetching and analyzing genomic metadata from NCBI BioSample records. When you download ncbi_dataset.tsv from the NCBI genome database, the metadata fields such as 'Collection Date', 'Host', 'Geographic Location', and 'Isolation Source' are missing. This tool helps fetch the associated metadata for each BioSample ID. FetchM requires an input file (ncbi_dataset.tsv) from the NCBI genome database, retrieves additional annotations from NCBI, filters the data based on quality thresholds, and generates visualizations to help interpret the results. You can also download the filtered sequences.
Features
- Fetch metadata from NCBI BioSample API.
- Filter genomes based on CheckM completeness and ANI check status.
- Generate metadata summaries and annotation statistics.
- Create various visualizations for geographic distribution, collection dates, and gene counts.
- Download genome sequences (optional).
Installation
Using Conda
You can install FetchM in a Conda environment:
conda create -n fetchM_env python=3.8
conda activate fetchM_env
conda create -n fetchM_env -c conda-forge python=3.8 pandas requests xmltodict matplotlib seaborn scipy tqdm
Using pip
Ensure you have Python 3 installed. Install dependencies with:
pip install -r requirements.txt
Usage
Run FetchM with the following command:
fetchM --input input.tsv --outdir results/
Additional Options:
--checkm 95(Set CheckM completeness threshold, default: 95)--seq(Enable sequence download mode)
Output
FetchM creates multiple output files inside the results/ directory:
- Metadata summaries in
metadata_output/ - Figures in
figures/ - Filtered datasets for further analysis
Visualizations
Annotation Distributions
Assembly Statistics
Metadata Summaries
Scatter Plots
License
This project is licensed under the MIT License.
Author
Developed by Tasnimul Arabi Anik.
Contributions
Contributions and improvements are welcome! Feel free to submit a pull request or report issues.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fetchm-0.1.0.tar.gz.
File metadata
- Download URL: fetchm-0.1.0.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90258cfd7f4c5585f6907415ac2df03569a34e1e169bf945d8199e3ec46d6a3e
|
|
| MD5 |
8fef6d0c34f34d54ffb0c17229c65a18
|
|
| BLAKE2b-256 |
b45a7bde8f5320f9e2e4bd57024fd6c32d4e7e49f6dd62c53fe3fa9f44a76bba
|
File details
Details for the file fetchm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: fetchm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14a830280ceb92a1096e613f8c9ebb474e0483f6c9bc82cad6643e4d36b7ddd0
|
|
| MD5 |
0df6990333d2f341274c642089ba74fe
|
|
| BLAKE2b-256 |
ffddd900782e1f454d4a10046a5df3d382733aabf48cf0e8a3598df9378ee20f
|