Skip to main content

A package for haplotype analysis from BAM files

Project description

Hasan: Haplotype Analysis from BAM files

Hasan Workflow

Hasan (Haplotype Algorithm for SNP Amplicon Networks) is a Python package for analyzing haplotypes from BAM files using SNP information. It constructs directed acyclic graphs (DAGs) to identify and visualize potential haplotypes based on sequencing data.

Features

  • Read and process SNP information from TSV files
  • Convert VCF files to compatible TSV format
  • Build phasing tables from BAM files
  • Create directed acyclic graphs (DAGs) for haplotype visualization
  • Find and analyze potential haplotypes
  • Interactive graph visualization with draggable nodes
  • Command-line interface with rich output formatting

Installation

pip install hasan

Requirements

  • Python ≥ 3.6
  • pysam
  • pandas
  • networkx
  • matplotlib
  • click
  • rich

Usage

Command Line Interface

The package provides two main commands:

  1. Analyze haplotypes:
hasan analyze <bam_file> <snps_file> [options]

Options:

  • --method/-m: Method for calculating path weights ('min' or 'multiply')
  • --plot/--no-plot: Enable/disable interactive plot visualization
  • --output/-o: Specify output TSV file for haplotype results
  • --verbose/-v: Print detailed progress information

Example:

hasan analyze sample.bam variants.tsv --method min --plot --output results.tsv --verbose
  1. Convert VCF to TSV:
hasan convert <input_vcf> <output_tsv> [options]

Options:

  • --verbose/-v: Print detailed progress information

Example:

hasan convert variants.vcf variants.tsv --verbose

Input File Formats

SNPs File (TSV format)

CHROM   POS     REF     ALT     QUAL    DP
chr1    1000    A       G       40      20
chr1    1500    C       T       35      15

Note: When converting from VCF, variants are filtered to:

  • Exclude indels (only SNPs are kept)
  • Require minimum quality score (QUAL ≥ 30)
  • Require minimum depth (DP ≥ 10)

Python API

from hasan import read_snps, build_phasing_table, create_dag, find_haplotypes

# Read SNP information
snps_df = read_snps("variants.tsv")

# Build phasing table
phasing_data = build_phasing_table("sample.bam", snps_df)

# Create graph
G = create_dag(phasing_data, snps_df)

# Find haplotypes
haplotypes = find_haplotypes(G, method='min')

Output

The package provides multiple output formats:

  1. Interactive visualization (when using --plot)
  2. Static graph image (haplotype_graph.png)
  3. TSV file with haplotype frequencies (when using --output)
  4. Rich console output showing:
    • Haplotype sequences
    • Proportions for each haplotype
    • Progress information (in verbose mode)

How It Works

  1. SNP Reading: Loads SNP positions and variants from a TSV file.
  2. Phasing Table: Processes BAM file to count base occurrences at SNP positions.
  3. Graph Construction: Creates a DAG where:
    • Nodes represent bases at each position
    • Edges represent connections between consecutive positions
    • Edge weights represent proportion of reads supporting the connection
  4. Haplotype Finding: Identifies possible haplotypes by finding paths through the graph.

Visualization

The interactive visualization allows you to:

  • Drag nodes to rearrange the graph
  • View edge weights representing read proportions
  • Distinguish between reference (green) and alternate (blue) bases

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hasan-0.2.2.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hasan-0.2.2-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file hasan-0.2.2.tar.gz.

File metadata

  • Download URL: hasan-0.2.2.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for hasan-0.2.2.tar.gz
Algorithm Hash digest
SHA256 64a6b467d320f35ebf9518320ee7fbed1114c7ce7a8ceecfbb864eac0cd08baf
MD5 806e631f35233e292d137941e9c3d81c
BLAKE2b-256 6d1bd39b36ba933484a9518a37ff4d75b0873c53263d2e24c08c0fd5eb97881f

See more details on using hashes here.

File details

Details for the file hasan-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: hasan-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for hasan-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9a0ed76a520693b260f08f7e16d9d80faab8fc1d1f348e44db5d75eee726a5b2
MD5 6b6355d128c7265cd4d02855c1b5cef6
BLAKE2b-256 f470289d9142cf052104436d51589516d9c3a022f933f8524cfa347c2f6785af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page