Skip to main content

A simple tool to determine the genome size of an organism

Project description

Gitpod ready-to-code

sizemeup

sizemeup is a simple tool to retrieve the genome size for a given species name. It utilizes known genome sizes available from NCBI's Assembly Reports in combination with user provided genome sizes that may not be available from NCBI.

Contributing

If you have a species of interest that is not available in the NCBI Assembly Reports, please consider submitting an issue so that we can get it added to sizemeup. Otherwise, if you have ideas to improve sizemeup please feel free to!

Installation

You can install sizemeup using conda:

conda create -n sizemeup -c conda-forge -c bioconda sizemeup
conda activate sizemeup
sizemeup --help

Available Commands

sizemeup

sizemeup is the main tool that outputs the known genome size for a given species name.

Usage

sizemeup --help

 Usage: sizemeup [OPTIONS]

 sizemeup - A simple tool to determine the genome size of an organism

╭─ Required Options ────────────────────────────────────────────────────────────────────╮
│ *  --species  -s  TEXT  The species to determine the size of [required]               │
│ *  --sizes    -z  TEXT  The built in sizes file to use [required]                     │
╰───────────────────────────────────────────────────────────────────────────────────────╯
╭─ Additional Options ──────────────────────────────────────────────────────────────────╮
│ --outdir   -o  PATH  Directory to write output [default: ./]                          │
│ --prefix   -p  TEXT  Prefix to use for output files [default: sizemeup]               │
│ --silent             Only critical errors will be printed                             │
│ --verbose            Increase the verbosity of output                                 │
│ --version  -V        Show the version and exit.                                       │
│ --help               Show this message and exit.                                      │
╰───────────────────────────────────────────────────────────────────────────────────────╯

Example

sizemeup --species "Staphylococcus aureus" --silent
                           Query Result
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Name                   TaxID  Size     Source  Method       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━┩
│ Staphylococcus aureus  1280   2800000  ncbi    manually-set │
└───────────────────────┴───────┴─────────┴────────┴──────────────┘
Writing the genome size to sizemeup-sizemeup.txt

sizemeup --species "Escherichia coli" --silent
                         Query Result
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Name              TaxID  Size     Source  Method       ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━┩
│ Escherichia coli  562    5150000  ncbi    manually-set │
└──────────────────┴───────┴─────────┴────────┴──────────────┘
Writing the genome size to sizemeup-sizemeup.txt

sizemeup --species "escherichia coli" --silent
                         Query Result
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Name              TaxID  Size     Source  Method       ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━┩
│ escherichia coli  562    5150000  ncbi    manually-set │
└──────────────────┴───────┴─────────┴────────┴──────────────┘
Writing the genome size to sizemeup-sizemeup.txt

If the --species value is found a table is printed to STDOUT as well as to a file named {PREFIX}-sizemeup.txt where {PREFIX} is the value of the --prefix option (default sizemeup).

Here is an example of the output file:

name	tax_id	size	source	method
Escherichia coli	562	5150000	ncbi	manually-set

However is a species is not found, the following message is printed to STDOUT:

sizemeup --species "escherichia colis" --silent
2024-09-29 20:24:17 ERROR    2024-09-29 20:24:17:root:ERROR - Could not find 'escherichia colis' in the sizes file,      sizemeup.py:138
                             please consider creating an issue at https://github.com/rpetit3/sizemeup/issues to report
                             this

sizemeup-build

sizemup-build is a helper tool used to build the genome size database for sizemeup. Do do this it:

  1. Downloads the latest NCBI Assembly Reports
  2. Determines species names based on tax id using NCBI Datasets API
  3. Merges any user provided genome sizes not available from NCBI

Note: This tool isn't necessary for most users, just a simple way to update the database on your own or at new releases of sizemeup.

In the end, it produces a TSV file with the following columns:

  • name - the species name
  • tax_id - the NCBI tax id
  • size - the genome size in base pairs
  • source - the source of the genome size (e.g. ncbi, user)
  • method - the method used to determine the genome size (e.g. automatic, manual)

Citing sizemeup

If you make use of sizemeup in your analysis, please cite the following:

Motivation and Naming

Talking with Taylor, we have a workflow in Bactopia called teton for human read scrubbing and taxonomic classification. After running teton, the idea was to run Bactopia to analyze the samples. However, Bactopia requires a genome size for each sample in order to calculate coverage and a few other metrics. Sure, we could manually look up the genome size for each sample, that would be tedious and time consuming. We decided to develop sizemeup to handle the looking up genome sizes for us. In addition this paves the way for users of Bactopia to use teton + sizemeup to easily mix species within their runs. In other words, siuzemeup was built to support the Bactopia workflow (but you can use it for whatever!).

As for the name, I wanted something fun and catchy. It's a simple tool to retrieve the genome size of a given species name, so, I thought "sizemeup" would work!

Funding

Support for this project came (in part) from the Wyoming Public Health Division.

Wyoming Public Health Division

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sizemeup-1.0.0.tar.gz (10.4 kB view hashes)

Uploaded Source

Built Distribution

sizemeup-1.0.0-py3-none-any.whl (11.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page