This Python package, is designed to calculate the fragment length ratios from a BAM file using the input BED and reference genome files. The script provides several options for manipulating the input intervals and applying GC content correction to the coverage analysis.

These details have not been verified by PyPI

Project description

fragscan_ct

Features

Fragment Ratio Calculation: The script calculates the ratio of short to long fragments based on the input BAM file.
Interval Manipulation: Users can choose to merge or split the intervals in the input BED file, as well as pad the coordinates before binning.
GC Content Correction: The script applies a LOWESS (Locally Weighted Scatterplot Smoothing) algorithm to correct the coverage based on the GC content of the fragments.
Visualization: The script generates plots to visualize the fragment length distribution and the GC-corrected coverage.
Output: The script generates a text file containing the calculated fragment counts, ratios, z-scores, and coverage information.

Dependencies

The script requires the following Python 3 libraries:

typer: For command-line interface
pathlib: For handling file paths
rich: For progress bar and console output
plotly: For generating interactive plots
pandas: For data manipulation
numpy: For numerical operations
scipy: For statistical functions

Usage

Main

❯ python fragscan_ct --help

 Usage: fragscan_ct [OPTIONS] COMMAND [ARGS]...

 ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 │ --install-completion          Install completion for the current shell.                                                                                  │
 │ --show-completion             Show completion for the current shell, to copy it or customize the installation.                                           │
 │ --help                        Show this message and exit.                                                                                                │
 ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 ╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 │ generate-fragment-ratios  The `generate_fragment_ratios` function generates a new TXT file by processing a BED file and calculating fragment length      │
 │                           ratios from a BAM file.                                                                                                        │
 │ plot-fragment-ratios      The `plot_fragment_ratios` function takes in a file of files or a list of input TXT files, reads the data from the files into  │
 │                           a Pandas DataFrame, and plots a line plot of the "Ratio" column against the "Id" column, with different colors for each        │
 │                           "Sample_Id". The resulting plot is saved as an HTML file.                                                                      │
 ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Generate Ratios

The generate_fragment_ratios function calculates fragment ratios from a BAM file using input BED and reference genome files, with options for interval manipulation and GC correction.

❯ python fragscan_ct generate-fragment-ratios --help

Usage: fragscan_ct generate_fragment_ratios [OPTIONS]

╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *  --reference-file         -r        FILE                  Input reference genome FASTA file to be used while traversing the BAM file [default: None] [required]                                                                                                                  │
│ *  --input-bed              -i        FILE                  Input BED file to be used to traverse the BAM file [default: None] [required]                                                                                                                                          │
│ *  --input-bam              -bam      FILE                  Input BAM file to be used to calculate fragment length [default: None] [required]                                                                                                                                      │
│    --output-txt             -o        TEXT                  Output TXT file after traversing the BAM file [default: fragment_counts.txt]                                                                                                                                           │
│ *  --sample-id              -id       TEXT                  Sample Identifier [default: None] [required]                                                                                                                                                                           │
│    --merge-interval         -m                              Merge interval in the BED file by splitting the 4th column with `:` and using the first value                                                                                                                          │
│    --split-interval         -s                              Split the BED interval based on the BIN size specified in the `bin_size` option.                                                                                                                                       │
│    --short-fragment-length  -sfl      <INTEGER INTEGER>...  Define which fragments should be called as short fragment, provide two integers separated by a comma, the first value in the tuple is the lower bound of the fragment length range for short fragments, and the second │
│                                                             value is the upper bound of the fragment length range for short fragments                                                                                                                                              │
│                                                             [default: 100, 150]                                                                                                                                                                                                    │
│    --long-fragment-length   -lfl      <INTEGER INTEGER>...  Define which fragments should be called as long fragment, provide two integers separated by a comma, the first value in the tuple is the lower bound of the fragment length range for long fragments, and the second   │
│                                                             value is the upper bound of the fragment length range for long fragments                                                                                                                                               │
│                                                             [default: 151, 220]                                                                                                                                                                                                    │
│    --bin-size               -b        INTEGER               Bin size to split the BED file, only used when `split_interval` is True [default: 50]                                                                                                                                  │
│    --pad-size               -p        INTEGER               Pad the coordinates with the given pad size in the BED file, before binning [default: 50]                                                                                                                              │
│    --lowess-fraction        -l        FLOAT                 When running lowess GC correction of coverage, the fraction of the data used when estimating each y-value [default: 0.75]                                                                                              │
│    --help                                                   Show this message and exit.                                                                                                                                                                                            │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

The required inputs are:

--reference-file: The reference genome FASTA file.
--input-bed: The input BED file containing the genomic intervals of interest.
--input-bam: The input BAM file containing the sequencing reads.
--sample-id: The identifier for the sample being processed.

The optional parameters allow you to customize the interval manipulation and GC content correction:

--merge-interval: Merges the intervals in the BED file.
--split-interval: Splits the intervals in the BED file based on the --bin-size parameter.
--bin-size: The size of the bins used when --split-interval is enabled.
--pad-size: The size of the padding applied to the coordinates in the BED file.
--lowess-fraction: The fraction of data used for the LOWESS GC content correction.

Example Command:

python fragscan_ct generate_fragment_ratios \
    --reference-file=hg38.fa \
    --input-bed=target_regions.bed \
    --input-bam=sample_data.bam \
    --output-txt=fragment_counts.txt \
    --sample-id=sample_1 \
    --short-fragment-length=100,150 \
    --long-fragment-length=151,220 \
    --lowess-fraction=0.75

Output

The script generates a text file named fragment_counts.txt (or the value specified in the --output-txt option) containing the following information:

Chromosome
Start position
End position
Additional information from the BED file
Strand
Score
Short fragment counts
Long fragment counts
Raw ratio
Coverage for short fragments
Coverage for long fragments
GC content for short fragments
GC content for long fragments

Plot Ratios

The plot_fragment_ratios function takes in a file of files or a list of input TXT files, reads the data from the files into a Pandas DataFrame, and plots a line plot of the "Ratio" column against the "Id" column, with different colors for each "Sample_Id". The resulting plot is saved as an HTML file.

❯ python fragscan_ct plot-fragment-ratios --help

 Usage: fragscan_ct plot-fragment-ratios
             [OPTIONS]

╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --list           -l      PATH  File of files, List of txt files to be used for plotting [default: None]                                                                                                                                                                            │
│ --input-txt      -i      FILE  Input TXT file that was generated using generate_fragment_counts [default: None]                                                                                                                                                                    │
│ --output-prefix  -o      TEXT  Output HTML file prefix for the line and box plot [default: fragment_counts]                                                                                                                                                                        │
│ --help                         Show this message and exit.                                                                                                                                                                                                                         │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Output

Fragment length distribution
GC-corrected coverage

These plots are saved as fragment_length_distribution.html and gc_corrected_coverage.html, respectively.

License

This project is licensed under the GPL3 License.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jul 9, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fragscan_ct-0.1.0.tar.gz (25.1 kB view details)

Uploaded Jul 9, 2024 Source

Built Distribution

fragscan_ct-0.1.0-py3-none-any.whl (26.1 kB view details)

Uploaded Jul 9, 2024 Python 3

File details

Details for the file fragscan_ct-0.1.0.tar.gz.

File metadata

Download URL: fragscan_ct-0.1.0.tar.gz
Upload date: Jul 9, 2024
Size: 25.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.12.4 Darwin/22.6.0

File hashes

Hashes for fragscan_ct-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`bf64aae7919abef8f9e7af389d81f788b8e22c93837663f1b750f8fcb6534cf6`
MD5	`75a4b8612163b6dd49ea80e4294e2c9d`
BLAKE2b-256	`934ab993bb06e6c95f74212a9cdedf4f580723c051b959ea38bd603b2e3ae00b`

See more details on using hashes here.

File details

Details for the file fragscan_ct-0.1.0-py3-none-any.whl.

File metadata

Download URL: fragscan_ct-0.1.0-py3-none-any.whl
Upload date: Jul 9, 2024
Size: 26.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.12.4 Darwin/22.6.0

File hashes

Hashes for fragscan_ct-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e98c21c3e7b52efdedac7c6c053678cb6ce14b42d9adc467f9c6e54ee3da36d9`
MD5	`3b8cfbe3b58e9a3d82d45cc5579e22b8`
BLAKE2b-256	`c46a133b53b4e9df3a1baf279e9e5c1692f488bc4d3b02a4f3fd546fc4b7cbc6`