Skip to main content

Check eukaryotic genomes or MAGs for completeness and contamination

Project description

EukCC

Coverage.py coverage

EukCC is a completeness and contamination estimator for metagenomic assembled microbial eukaryotic genomes.

Documentation

Head over to https://eukcc.readthedocs.io/ to check out the documentation.

Run

Download EukCC2 database from FTP

# create a folder were to keep the database
mkdir eukccdb
cd eukccdb
wget http://ftp.ebi.ac.uk/pub/databases/metagenomics/eukcc/eukcc2_db_ver_1.2.tar.gz
tar -xzvf eukcc2_db_ver_1.2.tar.gz
export EUKCC2_DB=$(realpath eukcc2_db_ver_1.2)

Quickstart using container

Get EukCC quickly by fetching the container.

The container is hosted and automatically build from the master branch here: https://quay.io/repository/microbiome-informatics/eukcc

docker pull quay.io/microbiome-informatics/eukcc
singularity pull docker://quay.io/microbiome-informatics/eukcc

Bioconda / pip

Alternatively you can install EukCC using conda or pip.

In addition, you need to install mandatory requirements:

  • metaeuk=4.a0f584d
  • pplacer
  • epa-ng=0.3.8
  • hmmer=3.3
  • minimap2
  • bwa
  • samtools

Outputs explanation

  • eukcc.log - log of execution

eukcc single

  • eukcc.csv - table with estimated completeness, contamination and taxonomy lineage

eukcc folder

  • eukcc.csv - table with estimated completeness, contamination and taxonomy lineage for good quality bins
  • merged_bins.csv - table of merged refined bins
  • bad_quality.csv- table with estimated completeness, contamination and taxonomy lineage for bad quality bins (chosen marker gene set is supported by less than half of the alignments)
  • missing_marker_genes.txt - line separated list of bins with not defined set of marker genes
  • merged_bins - folder with merged bins sequences
  • refine_workdir - working directory with intermediate steps results

Don't use EukCC on already pubished data

Or at least not without thinking about it:

You should not use EukCC on already published genomes, if they have used during training of the marker gene sets. If you want to make sure, you can see all used accessions in the database file db_base/backbone/base_taxinfo.csv.

Cite

If you use EukCC make sure to cite:

Saary, Paul, Alex L. Mitchell, and Robert D. Finn. 
"Estimating the quality of eukaryotic genomes recovered from metagenomic analysis with EukCC." 
Genome biology 21.1 (2020): 1-21.

EukCC also uses metaEUK, hmmer, pplacer, ete3 and epa-ng.

Changed compared to EukCC 1

Note: With version 2, EukCC should provide a better experience than version 1. Version 2 is not compatible with previous versions, most commandline arguments changed. So version 2 is not a drop in replacement.

  • Users can set the prevalence threshold for marker sets. In EukCC 1 this was fixed to 98% single copy prevalence. Now users could change that to be more strict. We find that often 100% single copy prevalence can be found.

Issues and bugs

Please report any bugs and issues here on GitHub. Make sure to include the debug log (run eukcc using --debug flag).

used exit codes

  • 200: File not found
  • 201: No Marker gene set could be defined
  • 202: No database provided
  • 203: Corrupted file
  • 204: Predicted zero proteins
  • 222: Invalid settings

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eukcc-2.1.3.tar.gz (56.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eukcc-2.1.3-py3-none-any.whl (51.7 kB view details)

Uploaded Python 3

File details

Details for the file eukcc-2.1.3.tar.gz.

File metadata

  • Download URL: eukcc-2.1.3.tar.gz
  • Upload date:
  • Size: 56.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for eukcc-2.1.3.tar.gz
Algorithm Hash digest
SHA256 a70e36789a11c8bd106c260e8079d86c3d0f840173844c16e1000759a93c7505
MD5 6693d0abca46f9a19a2712c673441709
BLAKE2b-256 a607cb921381e67bd2f21080db05b9a97b5a3cc0c698642766941cd544111be7

See more details on using hashes here.

Provenance

The following attestation bundles were made for eukcc-2.1.3.tar.gz:

Publisher: python-publish.yml on EBI-Metagenomics/EukCC

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eukcc-2.1.3-py3-none-any.whl.

File metadata

  • Download URL: eukcc-2.1.3-py3-none-any.whl
  • Upload date:
  • Size: 51.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for eukcc-2.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f18904c295adbf26fb25a8d359e05fed0ba65cf67d3f813f8bbe5714b5176ed2
MD5 609a1637419df7ed07aef6d8757bc6a0
BLAKE2b-256 895c4cd76cdaa5eb8006eb4853bf40d3c5b56a667f08f2c4f095bd822bc6693d

See more details on using hashes here.

Provenance

The following attestation bundles were made for eukcc-2.1.3-py3-none-any.whl:

Publisher: python-publish.yml on EBI-Metagenomics/EukCC

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page