Tool to calculate a k-mer pattern partition from position specific k-mer counts.

These details have not been verified by PyPI

Project links

Project description

kmerPaPa

Tool to calculate a "k-mer pattern partition" from position specific k-mer counts. This can for instance be used to train a mutation rate model.

Requirements

kmerPaPa requires Python 3.8 or above.

Installation

kmerPaPa can be installed using pip:

pip install kmerpapa

or using pipx:

pipx install kmerpapa

Test data

The test data files used in the usage examples below can be downloaded from the test_data directory in the project's github repository:

wget https://github.com/BesenbacherLab/kmerPaPa/raw/main/test_data/mutated_5mers.txt
wget https://github.com/BesenbacherLab/kmerPaPa/raw/main/test_data/background_5mers.txt

Usage

If we want to train a mutation rate model then the input data should specifiy the number of times each k-mer is observed mutated and unmutated. One option is to have one file with the mutated k-mer counts (positive) and one file with the count of k-mers in the whole genome (background). We can then run kmerpapa like this:

kmerpapa --positive mutated_5mers.txt \
         --background background_5mers.txt \
         --penalty_values 3 5 7

The above command will first use cross validation to find the best penalty value between the values 3,5 and 7. Then it will find the optimal k-mer patter partiton using that penalty value. If both a list of penalty values and a list of pseudo-counts are specified then all combinations of values will be tested during cross validation:

kmerpapa --positive mutated_5mers.txt \
         --background background_5mers.txt \
         --penalty_values 3 5 6 \
         --pseudo_counts 0.5 1 10

If only a single combination of penalty_value and pseudo_count is provided then the default is not to run cross validation unless "--n_folds" option or the "CV_only" is used. The "CV_only" option can be used together with "--CVfile" option to parallelize grid search. Fx. using bash:

for c in 3 5 6; do
    for a in 0.5 1 10; do
        kmerpapa --positive mutated_5mers.txt \
         --background background_5mers.txt \
         --penalty_values $c \
         --pseudo_counts $a \
         --CV_only --CVfile CV_results_c${c}_a${a}.txt &
    done
done

Creating input data

Input files with k-mer counts can be created using kmer_counter. Given a file of point mutations in a file that contain the CHROM, POS, REF and ALT columns from a vcf file:

chr1 1000000 G A
chr1 1000100 G A
chr1 1000200 C T
chr1 1000300 C T
chr1 1000400 C T

We can count the 5-mers around each mutation using this command:

kmer_counter snv --radius 2 {genome}.2bit {point_mutations_file} > mutated_5mers.txt

Given a bed file with regions that are sufficiently covered by sequencing we can count the background 5-mers using this command:

kmer_counter background --bed {regions}.bed -radius 2 {genome}.2bit > background_5mers.txt

The file {genome}.2bit should be a 2bit file of the same reference genome that were used for calling the mutations. 2bit files can be downloaded from: https://hgdownload.cse.ucsc.edu/goldenpath/{genome}/bigZips/{genome}.2bit where {genome} is a valid UCSC genome assembly name (fx. "hg38").

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.4

Sep 25, 2024

0.2.3

Sep 16, 2022

0.2.2

Jan 24, 2022

0.2.1

Nov 11, 2021

0.2.0

Nov 8, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kmerpapa-0.2.4.tar.gz (21.5 kB view details)

Uploaded Sep 25, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kmerpapa-0.2.4-py3-none-any.whl (25.7 kB view details)

Uploaded Sep 25, 2024 Python 3

File details

Details for the file kmerpapa-0.2.4.tar.gz.

File metadata

Download URL: kmerpapa-0.2.4.tar.gz
Upload date: Sep 25, 2024
Size: 21.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.6.1 CPython/3.9.15 Darwin/23.6.0

File hashes

Hashes for kmerpapa-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`eae8662512a136deb53f13002b2bf94fa30306e9e3f1786be7beb4c07664742f`
MD5	`59c50c9315d98c6b31ed14d0ea086297`
BLAKE2b-256	`cf0ad2c3271267edbf5e0af4344caf82dbcd66e004a1fd028e4991907ccf1f53`

See more details on using hashes here.

File details

Details for the file kmerpapa-0.2.4-py3-none-any.whl.

File metadata

Download URL: kmerpapa-0.2.4-py3-none-any.whl
Upload date: Sep 25, 2024
Size: 25.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.6.1 CPython/3.9.15 Darwin/23.6.0

File hashes

Hashes for kmerpapa-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fa81e7cd3d0f39125f51d36dc90c7cea992f7a6d15ddf1385681264594dfb98f`
MD5	`c2807278617c04885710f3049ee13541`
BLAKE2b-256	`e2991d34de1e56365c89c0c82751d846f8a1bbc669c2babe094fe2438f9077fa`

See more details on using hashes here.

kmerpapa 0.2.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

kmerPaPa

Requirements

Installation

Test data

Usage

Creating input data

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes