Skip to main content

A package for haplotype analysis from BAM files

Project description

Hasan: Haplotype Analysis from BAM files

Hasan Workflow

Hasan (Haplotype Algorithm for SNP Amplicon Networks) is a Python package for analyzing haplotypes from BAM files using SNP information. It constructs directed acyclic graphs (DAGs) to identify and visualize potential haplotypes based on sequencing data.

Features

  • Read and process SNP information from TSV files
  • Convert VCF files to compatible TSV format
  • Build phasing tables from BAM files
  • Create directed acyclic graphs (DAGs) for haplotype visualization
  • Find and analyze potential haplotypes
  • Interactive graph visualization with draggable nodes
  • Command-line interface with rich output formatting

Installation

pip install hasan

Requirements

  • Python ≥ 3.6
  • pysam
  • pandas
  • networkx
  • matplotlib
  • click
  • rich

Usage

Command Line Interface

The package provides two main commands:

  1. Analyze haplotypes:
hasan analyze <bam_file> <snps_file> [options]

Options:

  • --plot/--no-plot: Enable/disable interactive plot visualization
  • --output/-o: Specify output TSV file for haplotype results
  • --verbose/-v: Print detailed progress information

Example:

hasan analyze sample.bam variants.tsv --plot --output results.tsv --verbose
  1. Convert VCF to TSV:
hasan convert <input_vcf> <output_tsv> [options]

Options:

  • --verbose/-v: Print detailed progress information

Example:

hasan convert variants.vcf variants.tsv --verbose

Input File Formats

SNPs File (TSV format)

CHROM   POS     REF     ALT     QUAL    DP
chr1    1000    A       G       40      20
chr1    1500    C       T       35      15

Note: When converting from VCF, variants are filtered to:

  • Exclude indels (only SNPs are kept)
  • Require minimum quality score (QUAL ≥ 30)
  • Require minimum depth (DP ≥ 10)

Python API

from hasan import read_snps, build_phasing_table, create_dag, find_haplotypes

# Read SNP information
snps_df = read_snps("variants.tsv")

# Build phasing table
phasing_data = build_phasing_table("sample.bam", snps_df)

# Create graph
G = create_dag(phasing_data, snps_df)

# Find haplotypes
haplotypes = find_haplotypes(G)

Output

The package provides multiple output formats:

  1. Interactive visualization (when using --plot)
  2. Static graph image (haplotype_graph.png)
  3. TSV file with haplotype frequencies (when using --output)
  4. Rich console output showing haplotype proportions

How It Works

  1. SNP Reading: Loads SNP positions and variants from a TSV file.
  2. Phasing Table: Processes BAM file to count base occurrences at SNP positions.
  3. Graph Construction: Creates a DAG where:
    • Nodes represent bases at each position
    • Edges represent connections between consecutive positions
    • Edge weights represent proportion of reads supporting the connection
  4. Haplotype Finding: Identifies possible haplotypes by finding paths through the graph.

Visualization

The interactive visualization allows you to:

  • Drag nodes to rearrange the graph
  • View edge weights representing read proportions
  • Distinguish between reference (green) and alternate (blue) bases

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hasan-0.2.0.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hasan-0.2.0-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file hasan-0.2.0.tar.gz.

File metadata

  • Download URL: hasan-0.2.0.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for hasan-0.2.0.tar.gz
Algorithm Hash digest
SHA256 97c792667294f33f4bc2cc643e993945f186ef68d20e218c8caae496fa342400
MD5 8e4cfc191704c909e3f4ba7c8e4a94ae
BLAKE2b-256 510caf9f54a99fb4e54dcdd46d27ea64fefc937e15a8ea676b642f1e97786ab4

See more details on using hashes here.

File details

Details for the file hasan-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: hasan-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for hasan-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 302dd9c1df870d4c9e5c58c188f1a3af8dbda35c0e21ea03cb0950a1c9971aeb
MD5 ad408c6b098a6bf8e919036fd5d3d08a
BLAKE2b-256 f598a8f06764e7c73578a98db50df5f79186a7eb31ba1a7faa5a972aeac1f0a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page