Generate cov3 files used in DEMIC
Project description
pycov3
A package for generating cov3 files which are generated from sam files giving coverage information and a fasta file giving binned contigs. Cov3 files are used as input for the DEMIC R package which calculates PTR, an estimate for bacterial growth rates.
Installation
PyPi
pip install pycov3
pycov3 -h
Bioconda
conda create -n pycov3 -c conda-forge -c bioconda pycov3
conda activate pycov3
pycov3 -h
DockerHub
docker pull ctbushman/pycov3:latest
docker run --rm --name pycov3 pycov3 pycov3 -h
GitHub
git clone https://github.com/Ulthran/pycov3.git
cd pycov3/
pip install .
pycov3 -h
Usage
Use -h
to see options for running the CLI.
$ pycov3 -h
The FASTAs should all be in one directory with names of the format {sample}.{bin_name}.fasta/.fa/.fna
and the SAMs should also all be in one directory with names of the format {sample}_{bin_name}.sam
. The output COV3 files will be written to a directory with names of the format {sample}.{bin_name}.cov3
.
You can also use the library in your own code. Create a SAM directory and FASTA directory, set any non-default window or coverage parameters, then create a COV3 directory and use it to generate a COV3 file for each contig set in the FASTA directory.
from pycov3.Directory import Cov3Dir, FastaDir, SamDir
sam_d = SamDir(Path("/path/to/sams/"), False)
window_params = {
"window_size": None,
"window_step": None,
"edge_length": sam_d.calculate_edge_length(),
}
coverage_params = {
"mapq_cutoff": None,
"mapl_cutoff": None,
"max_mismatch_ratio": None,
}
window_params = {k: v for k, v in window_params.items() if v is not None}
coverage_params = {k: v for k, v in coverage_params.items() if v is not None}
fasta_d = FastaDir(Path("/path/to/fastas/"), False)
cov3_d = Cov3Dir(
Path(args.out_dir),
False,
fasta_d.get_filenames(),
window_params,
coverage_params,
)
cov3_d.generate(sam_d, fasta_d)
Alternatively, to use the bare application logic and do all the file handling yourself, you can use the Cov3Generator
class which takes a list of generators as SAM inputs and a generator as a FASTA input.
from pycov3.Cov3Generator import Cov3Generator
from pycov3.File import Cov3File
cov3_generator = Cov3Generator(
sam_generators,
fasta_generator,
sample,
bin_name,
window_params,
**coverage_params,
)
cov3_dict = cov3_generator.generate_cov3()
# Write output
cov3_file = Cov3File(Path(/path/to/output/), "001")
cov3_file.write_generator(cov3_generator.generate_cov3())
Resource Requirements
Threads: pycov3 uses multiprocessing
to parallelize processing of input fastas. Increasing --thread_num
up to the number of input fastas should improve runtime, with no benefits beyond that number.
Memory: pycov3 uses generators as much as possible. The main memory users are the Contig
objects, which each hold a contig's sequence and information for each Window
over its length. There is also a coverages
dictionary that could potentially grow to the size of the largest contig (but that is very unlikely). At a minimum, twice the size of the largest contig should be given per thread.
Algorithmic Complexity: Assuming enough threads are provided to have each fasta file processed separately, the time complexity is roughly O(cwsr)
.
c
: Number of contigs in fasta
s
: Number of sam files
w
: Max number of windows per contig
r
: Max number of records per sam file
Help
Please use the Issues on this repo for any problems, questions, or suggestions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pycov3-2.1.0.tar.gz
.
File metadata
- Download URL: pycov3-2.1.0.tar.gz
- Upload date:
- Size: 26.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7fcd34f840fb93b0e951eca51f23b4df7548412577d9bc4a6e1fbf855b97329a |
|
MD5 | d20dc993745f512e8a17d9498863dc54 |
|
BLAKE2b-256 | 485bca30967e07610672d12fc269edab1e9e0e0735dd47de015c29affac1ea9a |
File details
Details for the file pycov3-2.1.0-py3-none-any.whl
.
File metadata
- Download URL: pycov3-2.1.0-py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 390e889143d74cbf9ec467b0021ec63a1f3c78238a98ae6a7e4208ddc2b9951e |
|
MD5 | eada98be96042bfc5a36abacc2537a6e |
|
BLAKE2b-256 | fa5e5599bce63b14fa0c07d2bf9f5d188b15c15a563e79c43ef6739abe0f43d2 |