Skip to main content

Tools for normalizing isoform counts

Project description

K-mer Counts Merging and Normalization**

merge_normalized_isoform_count_TPM.py and merge_normalize_isoform_count_v1.py

This script is designed to further process the output of a previous k-mer counting script. Its purpose is to merge the k-mer count data into the original k-mer CSV files and to normalize these counts to account for differences in the total number of k-mers and read counts. This is a necessary step in many bioinformatics workflows, particularly those involving comparative genomics or quantitative assessment of sequence representation.

Features
Merges k-mer count data with the original k-mer list CSV files.
Normalizes k-mer frequencies using the total k-mer counts and read lengths.
Supports input from gzipped FASTQ files for read count determination.
Efficiently calculates normalization factors and processes large datasets.

Example usage: This script accepts command-line arguments to specify the input and output directories, the FASTQ file path, the read length, and the k-mer size. Here's how to run the script: For TPM

python ./scripts/merge_normalized_isoform_count_TPM.py --directory ./data/input --output_directory ./data/output --read_length 150 --k 50

For RPKM

python ./scripts/merge_merge_normalize_isoform_count_v1.py --directory ./data/input --output_directory ./data/output --read_length 150 --k 50

Command-Line Arguments --directory: The directory containing the *_kmers.csv and corresponding *_kmer_counts.csv files (required). This directory is same as the output directory from the last script (kmer_counting_loop.py). --output_directory: The directory where the merged and normalized CSV files will be saved (required). The output directory should be to a new directory for further GaussF workflow. --fastq: The path to the gzipped FASTQ file for which k-mer counts were computed (required). --read_length: The length of the reads in the FASTQ sequences, necessary for normalization (default is 150). --k: The length of the k-mers used during the counting process (default is 50). Output

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

count_normalize-0.1.1.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

count_normalize-0.1.1-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file count_normalize-0.1.1.tar.gz.

File metadata

  • Download URL: count_normalize-0.1.1.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.4

File hashes

Hashes for count_normalize-0.1.1.tar.gz
Algorithm Hash digest
SHA256 37eebb909af9f793b9acc82da04ae0e74fff69b3144ae03d187dbbed03829914
MD5 b46f2ab86c046b5cef84c30db6ad0080
BLAKE2b-256 45bac26fe9cf7e863fd63820b9940d6bb98204512161ae60108afad4a7c8717d

See more details on using hashes here.

File details

Details for the file count_normalize-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for count_normalize-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 54ac444a21447ee428845b8372891b7acae2052c81b54f20d473d9479ccc7c04
MD5 4660ce05358d5f00330a1904b9988713
BLAKE2b-256 99df5d186b189305b3943d052a2b2ec3b2fdeaebf247ca4900a007fe3add3afb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page