Skip to main content

Tool to calculate a k-mer pattern partition from position specific k-mer counts.

Project description

kmerPaPa

Tool to calculate a "k-mer pattern partition" from position specific k-mer counts. This can for instance be used to train a mutation rate model.

Requirements

kmerPaPa requires Python 3.7 or above.

Installation

kmerPaPa can be installed using pip:

pip install kmerpapa

or using pipx:

pipx install kmerpapa

Test data

The test data files used in the usage examples below can be downloaded from the test_data directory in the project's github repository:

wget https://github.com/BesenbacherLab/kmerPaPa/raw/main/test_data/mutated_5mers.txt
wget https://github.com/BesenbacherLab/kmerPaPa/raw/main/test_data/background_5mers.txt

Usage

If we want to train a mutation rate model then the input data should specifiy the number of times each k-mer is observed mutated and unmutated. One option is to have one file with the mutated k-mer counts (positive) and one file with the count of k-mers in the whole genome (background). We can then run kmerpapa like this:

kmerpapa --positive mutated_5mers.txt \
         --background background_5mers.txt \
         --penalty_values 3 5 7

The above command will first use cross validation to find the best penalty value between the values 3,5 and 7. Then it will find the optimal k-mer patter partiton using that penalty value. If both a list of penalty values and a list of pseudo-counts are specified then all combinations of values will be tested during cross validation:

kmerpapa --positive mutated_5mers.txt \
         --background background_5mers.txt \
         --penalty_values 3 5 6 \
         --pseudo_counts 0.5 1 10

If only a single combination of penalty_value and pseudo_count is provided then the default is not to run cross validation unless "--n_folds" option or the "CV_only" is used. The "CV_only" option can be used together with "--CVfile" option to parallelize grid search. Fx. using bash:

for c in 3 5 6; do
    for a in 0.5 1 10; do
        kmerpapa --positive mutated_5mers.txt \
         --background background_5mers.txt \
         --penalty_values $c \
         --pseudo_counts $a \
         --CV_only --CVfile CV_results_c${c}_a${a}.txt &
    done
done

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kmerpapa-0.2.1.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

kmerpapa-0.2.1-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file kmerpapa-0.2.1.tar.gz.

File metadata

  • Download URL: kmerpapa-0.2.1.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.8.11 Darwin/20.6.0

File hashes

Hashes for kmerpapa-0.2.1.tar.gz
Algorithm Hash digest
SHA256 d8215eebb304e5ee94165b257c5851be79b71a95be9224c599bf7399fd766e43
MD5 3c0513811366fef511e6fb6ebefc52a8
BLAKE2b-256 a55a549b183fe5f585e392bb0da93c3756b26d526ada9ed6a98b0479c324ff31

See more details on using hashes here.

File details

Details for the file kmerpapa-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: kmerpapa-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.8.11 Darwin/20.6.0

File hashes

Hashes for kmerpapa-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 baf00051de18ff1454e8e70abf9780cd9c003ce99bc83eae59416318c3b1aaf3
MD5 cc2ba0609fd95555e6ac07af6c7dc368
BLAKE2b-256 d722eb1fa7e6611dde92572cf85de523efda7d630d6b21f1f95e24d59bcf9fff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page