CheckU: UNI56 marker completeness profiling for microbial genomes.
Project description
CheckU
CheckU evaluates bacterial and archaeal genomes with the UNI56 universal single-copy marker set. The program reads amino acid FASTA files or nucleotide assemblies, calls genes with Pyrodigal when needed, and scores markers with PyHMMER. Results include completeness, contamination, and per-marker hit tables.
Requirements
- Linux x86_64 with enough CPU and RAM for HMMER searches
- FASTA inputs in plain or gzip form (
.faa,.fa,.fna, and friends)
Installation
Option 1: pip (PyPI)
pip install checku
Option 2: Pixi (development)
pixi install
Quick Check
checku --help
If you are running from the repository with Pixi:
pixi run python -m checku --help
You should see the command line help without errors.
Input Rules
- Provide either a single FASTA file or a directory of FASTA files.
- Protein files are used as-is. Nucleotide files trigger Pyrodigal gene calls.
- Compressed files (
.gz) are supported; they are unpacked into the run workspace.
Running The Pipeline
If you are running from the repository with Pixi, replace checku below with pixi run python -m checku.
Pipeline Overview
The diagram below shows the main stages executed by CheckU.
graph TD
A([Start run]) --> B[Collect FASTA inputs from file or directory]
B --> C[Materialize gzipped files under `work/` when needed]
C --> D{Detect sequence type}
D -->|Protein| E[Use supplied protein FASTA]
D -->|Nucleotide| F[Predict proteins with Pyrodigal]
F --> E
E --> G[Search UNI56 HMMs with pyhmmer]
G --> H[Aggregate marker hits and completeness statistics]
H --> I[Write `checku_summary.tsv`]
H --> J[Write `details/checku_presence.tsv`]
H --> K[Write raw hit tables in `details/hits/`]
H --> L[Update checkpoint data and logs]
H -.-> M[Optional: delete predicted proteins when `--clean-intermediate`]
I --> N([Pipeline complete])
J --> N
K --> N
L --> N
M --> N
Single Proteome
checku run \
data/test_genomes/faa/IMGI2140918011.faa \
--output-dir tmp/proteome_example \
--cpus 4
Directory Of Proteomes
checku run \
data/test_genomes/faa \
--output-dir tmp/proteome_batch \
--cpus 8
Single Assembly
checku run \
data/test_genomes/fna/IMG2140918011.fna \
--output-dir tmp/assembly_example \
--cpus 4 \
--clean-intermediate
Use --clean-intermediate if you do not need the predicted protein FASTA after the run.
Custom Marker Sets
- The default marker file ships with CheckU (UNI56).
- Point
--hmmto a different GA-calibrated.hmmfile or to a directory that holds.hmmor.hmm.gzprofiles. - Every profile must define GA cutoffs. The run stops early if a profile is missing them or if names are duplicated.
Example:
checku run \
/path/to/genomes \
--hmm /path/to/custom_markers.hmm \
--output-dir tmp/custom_markers \
--cpus 8
Outputs
All outputs live in the chosen --output-dir.
checku_summary.tsv— per-genome summary with completeness, contamination, duplicate counts, and Pyrodigal gene statistics.details/checku_presence.tsv— marker presence/absence matrix.details/hits/*.tsv— raw pyhmmer hits with domain scores.checkpoint/checku_checkpoint.json— resume data for interrupted runs.logs/checku.log— timestamps, command line, and status messages.
Resume And Logging
- Runs resume automatically when
--resumeis left on (default). - Use
--no-resumeto start fresh; the older checkpoint is copied aside. - Increase
--log-leveltoDEBUGwhen you need extra detail.
Verification Step
Small test data sets are stored under data/test_genomes/. After installation you can confirm the pipeline by running:
checku run data/test_genomes/faa --output-dir tmp/test_run --cpus 2
The command should finish without errors and produce the summary and presence tables described above.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file checku-0.1.2.tar.gz.
File metadata
- Download URL: checku-0.1.2.tar.gz
- Upload date:
- Size: 10.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
857271f0d967b610e153a76211f8f77bbdecbbfea62729c87c55e742cdb609fd
|
|
| MD5 |
f56ee4afa72e3f7030f3343adc66e5f4
|
|
| BLAKE2b-256 |
6e8f7e90a6c595305fbdac1bed767a757d69277b26e3703a424bc369b3c1fafc
|
File details
Details for the file checku-0.1.2-py3-none-any.whl.
File metadata
- Download URL: checku-0.1.2-py3-none-any.whl
- Upload date:
- Size: 1.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6ae144690263dd6dc22894faf91342836270a418777ae4e67af04aefaba59df
|
|
| MD5 |
c517f5295bb23d9208836c50fcf17a43
|
|
| BLAKE2b-256 |
4e8ea8d9b7f8b92352c0e39cb850350b448d83e1a80f13cc31c643d4fc58bbbc
|