A package to analyze split reads of long read data
Project description
Hairloom
Hairloom is a Python package for analyzing split-read alignments from long-read sequencing data. It provides tools to extract read-level split-read tables from BAM files and process these alignments into breakpoint, segment, and translocation tables for downstream analysis.
Features
- Extract and normalize split-read alignments from BAM files.
- Generate tables for:
- Breakpoints
- Genomic Segments
- Translocations
- Designed for compatibility with long-read sequencing data from platforms like ONT and PacBio.
- Simple integration with downstream genomic workflows.
Installation
- Requires
python>= 3.9 for smooth installation. - Requires
pandas>=1.1.0 to run CLI commands (as perrequirements.txt).
pip install hairloom
Usage
Below is an example usage of the Hairloom CLI:
Input
- A BAM file (
tests/data/test_reads.bam).. - A specific genomic region (
chrom,start,end).
1. Extract Split-Read Alignments
Use the extract command to extract split-read alignments from a BAM file for a specified region.
Command:
hairloom extract <bam> <chrom> <start> <end>
Example:
hairloom extract tests/data/test_reads.bam chr1 50 150
Output:
A TSV-format of split-read alignment data given to STDOUT:
qname chrom start end strand clip1 match clip2 pclip1
read1 chr1 100 200 + 100 100 0 100
2. Generate Breakpoint Table
Use the breakpoints command to generate a table of breakpoints from a BAM file.
Command:
hairloom breakpoints <bam> <chrom> <start> <end>
Example:
hairloom breakpoints tests/data/test_reads.bam chr1 50 150
Output:
A TSV-format STDOUT stream of breakpoints:
chrom pos ori support
chr1 201 + 1
chr2 300 - 1
chr2 400 + 1
chr2 700 - 1
chr2 800 + 1
chr2 900 - 1
3. Generate Segment Table
Use the segments command to generate a table of breakpoints from a BAM file.
Command:
hairloom segments <bam> <chrom> <start> <end>
Example:
hairloom breakpoints tests/data/test_reads.bam chr1 50 150
Output:
A TSV-format STDOUT stream of genomic segments:
chrom pos1 pos2 support
chr2 300 400 1
chr2 700 800 1
4. Generate SV Table
Use the svs command to generate a table of breakpoints from a BAM file.
Command:
hairloom svs <bam> <chrom> <start> <end>
Example:
hairloom svs tests/data/test_reads.bam chr1 50 150
Output:
A TSV-format STDOUT stream of breakpoint pairs:
chrom1 pos1 ori1 chrom2 pos2 ori2 support
chr1 201 + chr2 300 - 1
chr2 400 + chr2 800 + 1
chr2 700 - chr2 900 - 1
Python Workflow
Below is an example workflow using Hairloom:
Input
- A BAM file (
tests/data/test_reads.bam).. - A specific genomic region (
chrom,start,end).
1. Extract Read-Level Split Read Table
import pysam
from hairloom import extract_read_data
# Open the BAM file
bam_path = "tests/data/test_reads.bam"
bam_file = pysam.AlignmentFile(bam_path, "rb")
# Extract split-read alignment table for a region
chrom, start, end = "chr1", 50, 150
read_table = extract_read_data(bam_file, chrom, start, end)
print("Extracted Read-Level Table:")
print(read_table)
Output: A table of split-read alignments with columns like:
qname: Read name.chrom: Chromosome.start: Alignment start position.end: Alignment end position.strand: Strand information.clip1,clip2: Soft/hard clip lengths.match: Number of matched bases.pclip1: Strand-corrected clip length.
Extracted Read-Level Split-Read Table:
qname chrom start end strand clip1 match clip2 pclip1
0 read1 chr1 101 201 + 0 100 300 0
1 read1 chr2 300 400 + 100 100 200 100
2 read1 chr2 700 800 - 100 100 200 200
3 read1 chr2 900 1000 + 300 100 0 300
2. Generate Breakpoint Table
from hairloom import make_bundle, make_brk_table, make_brk_supports
# Make bundled BreakpointChain from read table
bundle = make_bundle(read_table)
# Create a breakpoint table
breakpoint_table = make_brk_table(bundle)
print(breakpoint_table)
Output: A table of breakpoints with columns:
chrom: Chromosome.pos: Position.ori: Orientation (+or-).support: Support count.
chrom pos ori support
0 chr1 201 + 1
1 chr2 300 - 1
2 chr2 400 + 1
3 chr2 700 - 1
4 chr2 800 + 1
5 chr2 900 - 1
3. Generate Segment Table
from hairloom import make_seg_table
# Generate segment table
segment_table = make_seg_table(bundle)
print(segment_table.head())
Output: A table of genomic segments with columns:
chrom: Chromosome.pos1: Start position.pos2: End position.support: Support count.
chrom pos1 pos2 support
0 chr2 300 400 1
1 chr2 700 800 1
4. Generate Translocation Table
from hairloom import make_tra_table
# Create a translocation table
translocation_table = make_tra_table(bundle)
print(translocation_table.head())
Output: A table of translocations with columns:
chrom1,pos1,ori1: First breakpoint information.chrom2,pos2,ori2: Second breakpoint information.support: Support count.
chrom1 pos1 ori1 chrom2 pos2 ori2 support
0 chr1 201 + chr2 300 - 1
1 chr2 400 + chr2 800 + 1
2 chr2 700 - chr2 900 - 1
Snippets
Full Workflow
import pysam
from hairloom import (
extract_read_data,
make_brk_table,
make_seg_table,
make_tra_table,
)
# Open BAM file
bam_file = pysam.AlignmentFile("tests/data/test_reads.bam", "rb")
# Step 1: Extract read-level split-read table
chrom, start, end = "chr1", 50, 150
read_table = extract_read_data(bam_file, chrom, start, end)
bundle = make_bundle(read_table)
# Step 2: Generate breakpoint table
breakpoint_table = make_brk_table(bundle)
# Step 3: Generate segment table
segment_table = make_seg_table(bundle)
# Step 4: Generate translocation table
translocation_table = make_tra_table(bundle)
# Print results
print("Read Table:")
print(read_table.head())
print("\nBreakpoint Table:")
print(breakpoint_table.head())
print("\nSegment Table:")
print(segment_table.head())
print("\nTranslocation Table:")
print(translocation_table.head())
Normalize an SV Table
The normalize_sv_table function sorts and normalizes the breakpoint pairs in a structural variant (SV) table, ensuring consistent ordering for downstream analyses.
import pandas as pd
from hairloom import normalize_sv_table
# Example SV table
sv_data = {
"chromosome_1": ["chr1", "chr2", "chr1", "chr3"],
"position_1": [200, 500, 300, 100],
"strand_1": ["+", "-", "+", "-"],
"chromosome_2": ["chr1", "chr2", "chr1", "chr2"],
"position_2": [100, 400, 400, 200],
"strand_2": ["-", "+", "-", "+"]
}
sv_table = pd.DataFrame(sv_data)
# Normalize the SV table
normalized_sv = normalize_sv_table(sv_table)
print("Original SV Table:")
print(sv_table)
print("\nNormalized SV Table:")
print(normalized_sv)
Output:
Original SV Table:
chromosome_1 position_1 strand_1 chromosome_2 position_2 strand_2
0 chr1 200 + chr1 100 -
1 chr2 500 - chr2 400 +
2 chr1 300 + chr1 400 -
3 chr3 100 - chr2 200 +
Normalized SV Table:
chromosome_1 position_1 strand_1 chromosome_2 position_2 strand_2
0 chr1 100 - chr1 200 +
1 chr2 400 + chr2 500 -
2 chr1 300 + chr1 400 -
3 chr2 200 + chr3 100 -
Get Structural Variant Type
The get_svtype function determines the type of structural variant (SV) represented by a BreakpointPair. It supports common SV types: translocation (TRA), inversion (INV), duplication (DUP), and deletion (DEL).
from hairloom import Breakpoint, BreakpointPair, get_svtype
# Example BreakpointPairs
tra1 = BreakpointPair(Breakpoint("chr1", 100, "+"), Breakpoint("chr2", 200, "-"))
inv = BreakpointPair(Breakpoint("chr1", 100, "+"), Breakpoint("chr1", 200, "+"))
del_sv = BreakpointPair(Breakpoint("chr1", 300, "+"), Breakpoint("chr1", 500, "-"))
dup = BreakpointPair(Breakpoint("chr1", 500, "-"), Breakpoint("chr1", 700, "+"))
# Get SV types
print("SV Types:")
print(f"TRA: {get_svtype(tra1)}") # Output: TRA
print(f"INV: {get_svtype(inv)}") # Output: INV
print(f"DEL: {get_svtype(del_sv)}") # Output: DEL
print(f"DUP: {get_svtype(dup)}") # Output: DUP
Output:
SV Types:
TRA: TRA
INV: INV
DEL: DEL
DUP: DUP
Contributing
Contributions are welcome! Please submit issues and pull requests via the GitHub repository.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Feel free to adapt this README.md to fit your exact requirements or additional features in the Hairloom package!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hairloom-0.0.5.tar.gz.
File metadata
- Download URL: hairloom-0.0.5.tar.gz
- Upload date:
- Size: 72.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb02c2704b7e1974e8a983124e85040fb18896120d8f7ba02b99947a811f796e
|
|
| MD5 |
fa85378a1cc63a2c957e0f64abae3e4b
|
|
| BLAKE2b-256 |
247b7a9ea4e4e6c47499e75096d10ba94988a449bec4a5dfa2cc5a00d2825071
|
File details
Details for the file hairloom-0.0.5-py3-none-any.whl.
File metadata
- Download URL: hairloom-0.0.5-py3-none-any.whl
- Upload date:
- Size: 59.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5bdb9c279c6550e144669ef718b9f795a892e6b55216493e9d4bfdb541695d4e
|
|
| MD5 |
e1e36009fd6ee22172a9f8543732fdc4
|
|
| BLAKE2b-256 |
40bf477300565e27b740ce2d2da1f2a77e84f1002601c999c8d24c6d3572f228
|