Tools for normalizing isoform counts
Project description
K-mer Counts Merging and Normalization
**
merge_normalized_isoform_count_TPM.py and merge_normalize_isoform_count_v1.py
This script is designed to further process the output of a previous k-mer counting script. Its purpose is to merge the k-mer count data into the original k-mer CSV files and to normalize these counts to account for differences in the total number of k-mers and read counts. This is a necessary step in many bioinformatics workflows, particularly those involving comparative genomics or quantitative assessment of sequence representation.
Features
Merges k-mer count data with the original k-mer list CSV files.
Normalizes k-mer frequencies using the total k-mer counts and read lengths.
Supports input from gzipped FASTQ files for read count determination.
Efficiently calculates normalization factors and processes large datasets.
Example usage: This script accepts command-line arguments to specify the input and output directories, the FASTQ file path, the read length, and the k-mer size. Here's how to run the script: For TPM
python ./scripts/merge_normalized_isoform_count_TPM.py --directory ./data/input --output_directory ./data/output --read_length 150 --k 50
For RPKM
python ./scripts/merge_merge_normalize_isoform_count_v1.py --directory ./data/input --output_directory ./data/output --read_length 150 --k 50
Command-Line Arguments --directory: The directory containing the *_kmers.csv and corresponding *_kmer_counts.csv files (required). This directory is same as the output directory from the last script (kmer_counting_loop.py). --output_directory: The directory where the merged and normalized CSV files will be saved (required). The output directory should be to a new directory for further GaussF workflow. --fastq: The path to the gzipped FASTQ file for which k-mer counts were computed (required). --read_length: The length of the reads in the FASTQ sequences, necessary for normalization (default is 150). --k: The length of the k-mers used during the counting process (default is 50). Output
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file count_normalize-0.1.1.tar.gz
.
File metadata
- Download URL: count_normalize-0.1.1.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37eebb909af9f793b9acc82da04ae0e74fff69b3144ae03d187dbbed03829914 |
|
MD5 | b46f2ab86c046b5cef84c30db6ad0080 |
|
BLAKE2b-256 | 45bac26fe9cf7e863fd63820b9940d6bb98204512161ae60108afad4a7c8717d |
File details
Details for the file count_normalize-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: count_normalize-0.1.1-py3-none-any.whl
- Upload date:
- Size: 6.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54ac444a21447ee428845b8372891b7acae2052c81b54f20d473d9479ccc7c04 |
|
MD5 | 4660ce05358d5f00330a1904b9988713 |
|
BLAKE2b-256 | 99df5d186b189305b3943d052a2b2ec3b2fdeaebf247ca4900a007fe3add3afb |