Skip to main content

This Python package, is designed to calculate the fragment length ratios from a BAM file using the input BED and reference genome files. The script provides several options for manipulating the input intervals and applying GC content correction to the coverage analysis.

Project description

fragscan_ct

This Python package, is designed to calculate the fragment length ratios from a BAM file using the input BED and reference genome files. The script provides several options for manipulating the input intervals and applying GC content correction to the coverage analysis.

Features

  1. Fragment Ratio Calculation: The script calculates the ratio of short to long fragments based on the input BAM file.
  2. Interval Manipulation: Users can choose to merge or split the intervals in the input BED file, as well as pad the coordinates before binning.
  3. GC Content Correction: The script applies a LOWESS (Locally Weighted Scatterplot Smoothing) algorithm to correct the coverage based on the GC content of the fragments.
  4. Visualization: The script generates plots to visualize the fragment length distribution and the GC-corrected coverage.
  5. Output: The script generates a text file containing the calculated fragment counts, ratios, z-scores, and coverage information.

Dependencies

The script requires the following Python 3 libraries:

  • typer: For command-line interface
  • pathlib: For handling file paths
  • rich: For progress bar and console output
  • plotly: For generating interactive plots
  • pandas: For data manipulation
  • numpy: For numerical operations
  • scipy: For statistical functions

Usage

Main

 python fragscan_ct --help

 Usage: fragscan_ct [OPTIONS] COMMAND [ARGS]...

 ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
  --install-completion          Install completion for the current shell.                                                                                    --show-completion             Show completion for the current shell, to copy it or customize the installation.                                             --help                        Show this message and exit.                                                                                                 ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 ╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
  generate-fragment-ratios  The `generate_fragment_ratios` function generates a new TXT file by processing a BED file and calculating fragment length                                  ratios from a BAM file.                                                                                                          plot-fragment-ratios      The `plot_fragment_ratios` function takes in a file of files or a list of input TXT files, reads the data from the files into                              a Pandas DataFrame, and plots a line plot of the "Ratio" column against the "Id" column, with different colors for each                                    "Sample_Id". The resulting plot is saved as an HTML file.                                                                       ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Generate Ratios

The generate_fragment_ratios function calculates fragment ratios from a BAM file using input BED and reference genome files, with options for interval manipulation and GC correction.

 python fragscan_ct generate-fragment-ratios --help

Usage: fragscan_ct generate_fragment_ratios [OPTIONS]

╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *  --reference-file         -r        FILE                  Input reference genome FASTA file to be used while traversing the BAM file [default: None] [required]                                                                                                                  │
│ *  --input-bed              -i        FILE                  Input BED file to be used to traverse the BAM file [default: None] [required]                                                                                                                                          │
│ *  --input-bam              -bam      FILE                  Input BAM file to be used to calculate fragment length [default: None] [required]                                                                                                                                      │
│    --output-txt             -o        TEXT                  Output TXT file after traversing the BAM file [default: fragment_counts.txt]                                                                                                                                           │
│ *  --sample-id              -id       TEXT                  Sample Identifier [default: None] [required]                                                                                                                                                                           │
│    --merge-interval         -m                              Merge interval in the BED file by splitting the 4th column with `:` and using the first value                                                                                                                          │
│    --split-interval         -s                              Split the BED interval based on the BIN size specified in the `bin_size` option.                                                                                                                                       │
│    --short-fragment-length  -sfl      <INTEGER INTEGER>...  Define which fragments should be called as short fragment, provide two integers separated by a comma, the first value in the tuple is the lower bound of the fragment length range for short fragments, and the second │
│                                                             value is the upper bound of the fragment length range for short fragments                                                                                                                                              │
│                                                             [default: 100, 150]                                                                                                                                                                                                    │
│    --long-fragment-length   -lfl      <INTEGER INTEGER>...  Define which fragments should be called as long fragment, provide two integers separated by a comma, the first value in the tuple is the lower bound of the fragment length range for long fragments, and the second   │
│                                                             value is the upper bound of the fragment length range for long fragments                                                                                                                                               │
│                                                             [default: 151, 220]                                                                                                                                                                                                    │
│    --bin-size               -b        INTEGER               Bin size to split the BED file, only used when `split_interval` is True [default: 50]                                                                                                                                  │
│    --pad-size               -p        INTEGER               Pad the coordinates with the given pad size in the BED file, before binning [default: 50]                                                                                                                              │
│    --lowess-fraction        -l        FLOAT                 When running lowess GC correction of coverage, the fraction of the data used when estimating each y-value [default: 0.75]                                                                                              │
│    --help                                                   Show this message and exit.                                                                                                                                                                                            │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

The required inputs are:

  • --reference-file: The reference genome FASTA file.
  • --input-bed: The input BED file containing the genomic intervals of interest.
  • --input-bam: The input BAM file containing the sequencing reads.
  • --sample-id: The identifier for the sample being processed.

The optional parameters allow you to customize the interval manipulation and GC content correction:

  • --merge-interval: Merges the intervals in the BED file.
  • --split-interval: Splits the intervals in the BED file based on the --bin-size parameter.
  • --bin-size: The size of the bins used when --split-interval is enabled.
  • --pad-size: The size of the padding applied to the coordinates in the BED file.
  • --lowess-fraction: The fraction of data used for the LOWESS GC content correction.

Example Command:

python fragscan_ct generate_fragment_ratios \
    --reference-file=hg38.fa \
    --input-bed=target_regions.bed \
    --input-bam=sample_data.bam \
    --output-txt=fragment_counts.txt \
    --sample-id=sample_1 \
    --short-fragment-length=100,150 \
    --long-fragment-length=151,220 \
    --lowess-fraction=0.75

Output

The script generates a text file named fragment_counts.txt (or the value specified in the --output-txt option) containing the following information:

  • Chromosome
  • Start position
  • End position
  • Additional information from the BED file
  • Strand
  • Score
  • Short fragment counts
  • Long fragment counts
  • Raw ratio
  • Coverage for short fragments
  • Coverage for long fragments
  • GC content for short fragments
  • GC content for long fragments

Plot Ratios

The plot_fragment_ratios function takes in a file of files or a list of input TXT files, reads the data from the files into a Pandas DataFrame, and plots a line plot of the "Ratio" column against the "Id" column, with different colors for each "Sample_Id". The resulting plot is saved as an HTML file.

 python fragscan_ct plot-fragment-ratios --help

 Usage: fragscan_ct plot-fragment-ratios
             [OPTIONS]

╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --list           -l      PATH  File of files, List of txt files to be used for plotting [default: None]                                                                                                                                                                            │
│ --input-txt      -i      FILE  Input TXT file that was generated using generate_fragment_counts [default: None]                                                                                                                                                                    │
│ --output-prefix  -o      TEXT  Output HTML file prefix for the line and box plot [default: fragment_counts]                                                                                                                                                                        │
│ --help                         Show this message and exit.                                                                                                                                                                                                                         │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Output

  • Fragment length distribution
  • GC-corrected coverage

These plots are saved as fragment_length_distribution.html and gc_corrected_coverage.html, respectively.

License

This project is licensed under the GPL3 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fragscan_ct-0.1.0.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

fragscan_ct-0.1.0-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file fragscan_ct-0.1.0.tar.gz.

File metadata

  • Download URL: fragscan_ct-0.1.0.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Darwin/22.6.0

File hashes

Hashes for fragscan_ct-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bf64aae7919abef8f9e7af389d81f788b8e22c93837663f1b750f8fcb6534cf6
MD5 75a4b8612163b6dd49ea80e4294e2c9d
BLAKE2b-256 934ab993bb06e6c95f74212a9cdedf4f580723c051b959ea38bd603b2e3ae00b

See more details on using hashes here.

File details

Details for the file fragscan_ct-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fragscan_ct-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Darwin/22.6.0

File hashes

Hashes for fragscan_ct-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e98c21c3e7b52efdedac7c6c053678cb6ce14b42d9adc467f9c6e54ee3da36d9
MD5 3b8cfbe3b58e9a3d82d45cc5579e22b8
BLAKE2b-256 c46a133b53b4e9df3a1baf279e9e5c1692f488bc4d3b02a4f3fd546fc4b7cbc6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page