Skip to main content

Python bindings for the mehari variant annotator

Project description

mehari

Python bindings for the mehari Rust library.

Features

  • Single variants: Annotate a single variant using a format string (chr:pos:ref:alt) or keyword arguments.
  • Multiple variants (experimental): Evaluate the compound effect of multiple variants.
  • DataFrames: Process batches of variants by passing a polars.DataFrame.
  • LazyFrames: Support for polars.LazyFrame to process large datasets (like Parquet files) without loading everything into memory.

Usage

Initialize SeqvarsAnnotator with your transcript database (see mehari-data-tx) and a reference genome (FASTA, uncompressed, with index).

from mehari import SeqvarsAnnotator

annotator = SeqvarsAnnotator(
    transcript_db_paths=["path/to/txs.bin.zst"],
    reference_path="path/to/reference.fa"
)

To annotate a single variant either use colon separated format string or keyword arguments:

result1 = annotator.annotate("17:41197701:G:C")
result2 = annotator.annotate(chromosome="3", position=193332511, reference="G", alternative="T")

To annotate multiple phased variants together as a single compound event (Experimental):

Note: Mehari does not infer phasing. When using annotate_multiple, mehari assumes all provided variants are on the same chromosome, exist on the same haplotype, and do not overlap.

result1 = annotator.annotate_multiple(["1:37799635:TA:A", "1:37799639:C:CG"])

result2 = annotator.annotate_multiple([
    {"chromosome": "1", "position": 37799635, "reference": "TA", "alternative": "A"},
    {"chromosome": "1", "position": 37799639, "reference": "C", "alternative": "CG"}
])

To annotate a batch of variants, pass a polars.DataFrame or polars.LazyFrame.

import polars as pl

df = pl.DataFrame(
    {
        "chromosome": ["17", "3"],
        "position": [41197701, 193332511],
        "reference": ["G", "G"],
        "alternative": ["C", "T"],
    },
    schema={
        "chromosome": pl.Categorical, "position": pl.Int32,
        "reference": pl.String, "alternative": pl.String
    }
)

annotated_df = annotator.annotate(df)

Schemas and types

Enums

Mehari exports its internal enums to Python so you can use them for filtering or comparisons:

from mehari import ConsequenceEnum, ImpactEnum

DataFrame Schema

When annotating a DataFrame or LazyFrame, mehari appends an "annotation" column. This column is a polars List(Struct) with the following fields:

  • allele: String
  • consequences: List(ConsequenceEnum)
  • putative_impact: ImpactEnum
  • gene_symbol: String
  • gene_id: String
  • feature_type: String
  • feature_id: String
  • feature_biotype: List(String)
  • feature_tags: List(String)
  • rank: Struct(ord: Int32, total: Int32)
  • cdna_pos: Struct(ord: Int32, total: Int32)
  • cds_pos: Struct(ord: Int32, total: Int32)
  • protein_pos: Struct(ord: Int32, total: Int32)
  • hgvs_g: String
  • hgvs_n: String
  • hgvs_c: String
  • hgvs_p: String
  • distance: Int32
  • strand: Int32
  • messages: List(String)

Building a transcript database

To build a transcript database, you can use the build_transcript_db function:

from mehari import build_transcript_db
build_transcript_db(
    assembly="grch38",
    annotation=["grch38.gff.gz"],
    transcript_sequences="grch38.fasta",
    transcript_source="ensembl",
    output="grch38.bin.zst"
)

This may take a while (several minutes for GRCh38 + Ensembl).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mehari-0.42.0.tar.gz (599.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mehari-0.42.0-cp313-cp313-manylinux_2_28_x86_64.whl (7.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

mehari-0.42.0-cp312-cp312-manylinux_2_28_x86_64.whl (7.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

mehari-0.42.0-cp311-cp311-manylinux_2_28_x86_64.whl (7.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

File details

Details for the file mehari-0.42.0.tar.gz.

File metadata

  • Download URL: mehari-0.42.0.tar.gz
  • Upload date:
  • Size: 599.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for mehari-0.42.0.tar.gz
Algorithm Hash digest
SHA256 6f5cdf1ba0b62fcf15a98d5b5145a1b91a272f822556e16860e466daa9db5431
MD5 f5f2eef55c69d7d04078113c930e7f56
BLAKE2b-256 567fa22ae9c4ced6f767ec58feb937a8b232fa33121701371efc0eb0d100ae2f

See more details on using hashes here.

File details

Details for the file mehari-0.42.0-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mehari-0.42.0-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ad6895ccb36b786f4453317b4c686e232ae626bcf1e2241bf5570ecbd0fc0efc
MD5 440907e328045451113e25c1419a66de
BLAKE2b-256 934e18644094a9c3608fa81f09c51339f218207d277ea4ba310183b4f1da9dff

See more details on using hashes here.

File details

Details for the file mehari-0.42.0-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mehari-0.42.0-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 172ad52cf2fde51b11572c0dde6fc753f4acaa89cd8b3c18e6d04ebe6e1549b0
MD5 3fc165e1962e8a6a1778c4d3e45998af
BLAKE2b-256 4b0aac488e80577820b0f7af52e6546c76050eda94e267c393d919204535b0cf

See more details on using hashes here.

File details

Details for the file mehari-0.42.0-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mehari-0.42.0-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 db7a6ab9f0557d3c276d3a1c7ef130ff2a5b3c63e4681449f0852985e262f24b
MD5 2e651c35d3a11cabb4e23ff854ddcd00
BLAKE2b-256 b51799bbb71d9e78e55b748abc8badec7b1bafef528d87759b05d7f6c0824146

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page