Check eukaryotic genomes or MAGs for completeness and contamination
Project description
EukCC
EukCC is a completeness and contamination estimator for metagenomic assembled microbial eukaryotic genomes.
Documentation
Head over to https://eukcc.readthedocs.io/ to check out the documentation.
Run
Download EukCC2 database from FTP
# create a folder were to keep the database
mkdir eukccdb
cd eukccdb
wget http://ftp.ebi.ac.uk/pub/databases/metagenomics/eukcc/eukcc2_db_ver_1.2.tar.gz
tar -xzvf eukcc2_db_ver_1.2.tar.gz
export EUKCC2_DB=$(realpath eukcc2_db_ver_1.2)
Quickstart using container
Get EukCC quickly by fetching the container.
The container is hosted and automatically build from the master branch here: https://quay.io/repository/microbiome-informatics/eukcc
docker pull quay.io/microbiome-informatics/eukcc
singularity pull docker://quay.io/microbiome-informatics/eukcc
Bioconda / pip
Alternatively you can install EukCC using conda or pip.
In addition, you need to install mandatory requirements:
- metaeuk=4.a0f584d
- pplacer
- epa-ng=0.3.8
- hmmer=3.3
- minimap2
- bwa
- samtools
Outputs explanation
eukcc.log- log of execution
eukcc single
eukcc.csv- table with estimated completeness, contamination and taxonomy lineage
eukcc folder
eukcc.csv- table with estimated completeness, contamination and taxonomy lineage for good quality binsmerged_bins.csv- table of merged refined binsbad_quality.csv- table with estimated completeness, contamination and taxonomy lineage for bad quality bins (chosen marker gene set is supported by less than half of the alignments)missing_marker_genes.txt- line separated list of bins with not defined set of marker genesmerged_bins- folder with merged bins sequencesrefine_workdir- working directory with intermediate steps results
Don't use EukCC on already pubished data
Or at least not without thinking about it:
You should not use EukCC on already published genomes, if they have used during training of the marker
gene sets. If you want to make sure, you can see all used accessions in the database file db_base/backbone/base_taxinfo.csv.
Cite
If you use EukCC make sure to cite:
Saary, Paul, Alex L. Mitchell, and Robert D. Finn.
"Estimating the quality of eukaryotic genomes recovered from metagenomic analysis with EukCC."
Genome biology 21.1 (2020): 1-21.
EukCC also uses metaEUK, hmmer, pplacer, ete3 and epa-ng.
Changed compared to EukCC 1
Note: With version 2, EukCC should provide a better experience than version 1. Version 2 is not compatible with previous versions, most commandline arguments changed. So version 2 is not a drop in replacement.
- Users can set the prevalence threshold for marker sets. In EukCC 1 this was fixed to 98% single copy prevalence. Now users could change that to be more strict. We find that often 100% single copy prevalence can be found.
Issues and bugs
Please report any bugs and issues here on GitHub. Make sure to
include the debug log (run eukcc using --debug flag).
used exit codes
- 200: File not found
- 201: No Marker gene set could be defined
- 202: No database provided
- 203: Corrupted file
- 204: Predicted zero proteins
- 222: Invalid settings
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eukcc-2.1.3.tar.gz.
File metadata
- Download URL: eukcc-2.1.3.tar.gz
- Upload date:
- Size: 56.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a70e36789a11c8bd106c260e8079d86c3d0f840173844c16e1000759a93c7505
|
|
| MD5 |
6693d0abca46f9a19a2712c673441709
|
|
| BLAKE2b-256 |
a607cb921381e67bd2f21080db05b9a97b5a3cc0c698642766941cd544111be7
|
Provenance
The following attestation bundles were made for eukcc-2.1.3.tar.gz:
Publisher:
python-publish.yml on EBI-Metagenomics/EukCC
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eukcc-2.1.3.tar.gz -
Subject digest:
a70e36789a11c8bd106c260e8079d86c3d0f840173844c16e1000759a93c7505 - Sigstore transparency entry: 168244053
- Sigstore integration time:
-
Permalink:
EBI-Metagenomics/EukCC@c8b8f89daac4bea33dc11aa8663bac6b07cfd196 -
Branch / Tag:
refs/tags/v.2.1.3 - Owner: https://github.com/EBI-Metagenomics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@c8b8f89daac4bea33dc11aa8663bac6b07cfd196 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file eukcc-2.1.3-py3-none-any.whl.
File metadata
- Download URL: eukcc-2.1.3-py3-none-any.whl
- Upload date:
- Size: 51.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f18904c295adbf26fb25a8d359e05fed0ba65cf67d3f813f8bbe5714b5176ed2
|
|
| MD5 |
609a1637419df7ed07aef6d8757bc6a0
|
|
| BLAKE2b-256 |
895c4cd76cdaa5eb8006eb4853bf40d3c5b56a667f08f2c4f095bd822bc6693d
|
Provenance
The following attestation bundles were made for eukcc-2.1.3-py3-none-any.whl:
Publisher:
python-publish.yml on EBI-Metagenomics/EukCC
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eukcc-2.1.3-py3-none-any.whl -
Subject digest:
f18904c295adbf26fb25a8d359e05fed0ba65cf67d3f813f8bbe5714b5176ed2 - Sigstore transparency entry: 168244055
- Sigstore integration time:
-
Permalink:
EBI-Metagenomics/EukCC@c8b8f89daac4bea33dc11aa8663bac6b07cfd196 -
Branch / Tag:
refs/tags/v.2.1.3 - Owner: https://github.com/EBI-Metagenomics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@c8b8f89daac4bea33dc11aa8663bac6b07cfd196 -
Trigger Event:
workflow_dispatch
-
Statement type: