Skip to main content

Python bindings for the mehari variant annotator

Project description

mehari

Python bindings for the mehari Rust library.

Features

  • Single variants: Annotate a single variant using a format string (chr:pos:ref:alt) or keyword arguments.
  • Multiple variants (experimental): Evaluate the compound effect of multiple variants.
  • DataFrames: Process batches of variants by passing a polars.DataFrame.
  • LazyFrames: Support for polars.LazyFrame to process large datasets (like Parquet files) without loading everything into memory.

Usage

Initialize SeqvarsAnnotator with your transcript database (see mehari-data-tx) and a reference genome (FASTA, uncompressed, with index).

from mehari import SeqvarsAnnotator

annotator = SeqvarsAnnotator(
    transcript_db_paths=["path/to/txs.bin.zst"],
    reference_path="path/to/reference.fa"
)

To annotate a single variant either use colon separated format string or keyword arguments:

result1 = annotator.annotate("17:41197701:G:C")
result2 = annotator.annotate(chromosome="3", position=193332511, reference="G", alternative="T")

To annotate multiple phased variants together as a single compound event (Experimental):

Note: Mehari does not infer phasing. When using annotate_multiple, mehari assumes all provided variants are on the same chromosome, exist on the same haplotype, and do not overlap.

result1 = annotator.annotate_multiple(["1:37799635:TA:A", "1:37799639:C:CG"])

result2 = annotator.annotate_multiple([
    {"chromosome": "1", "position": 37799635, "reference": "TA", "alternative": "A"},
    {"chromosome": "1", "position": 37799639, "reference": "C", "alternative": "CG"}
])

To annotate a batch of variants, pass a polars.DataFrame or polars.LazyFrame.

import polars as pl

df = pl.DataFrame(
    {
        "chromosome": ["17", "3"],
        "position": [41197701, 193332511],
        "reference": ["G", "G"],
        "alternative": ["C", "T"],
    },
    schema={
        "chromosome": pl.Categorical, "position": pl.Int32,
        "reference": pl.String, "alternative": pl.String
    }
)

annotated_df = annotator.annotate(df)

Schemas and types

Enums

Mehari exports its internal enums to Python so you can use them for filtering or comparisons:

from mehari import ConsequenceEnum, ImpactEnum

DataFrame Schema

When annotating a DataFrame or LazyFrame, mehari appends an "annotation" column. This column is a polars List(Struct) with the following fields:

  • allele: String
  • consequences: List(ConsequenceEnum)
  • putative_impact: ImpactEnum
  • gene_symbol: String
  • gene_id: String
  • feature_type: String
  • feature_id: String
  • feature_biotype: List(String)
  • feature_tags: List(String)
  • rank: Struct(ord: Int32, total: Int32)
  • cdna_pos: Struct(ord: Int32, total: Int32)
  • cds_pos: Struct(ord: Int32, total: Int32)
  • protein_pos: Struct(ord: Int32, total: Int32)
  • hgvs_g: String
  • hgvs_n: String
  • hgvs_c: String
  • hgvs_p: String
  • distance: Int32
  • strand: Int32
  • messages: List(String)

Building a transcript database

To build a transcript database, you can use the build_transcript_db function:

from mehari import build_transcript_db
build_transcript_db(
    assembly="grch38",
    annotation=["grch38.gff.gz"],
    transcript_sequences="grch38.fasta",
    transcript_source="ensembl",
    output="grch38.bin.zst"
)

This may take a while (several minutes for GRCh38 + Ensembl).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mehari-0.43.2.tar.gz (600.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mehari-0.43.2-cp313-cp313-manylinux_2_28_x86_64.whl (7.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

mehari-0.43.2-cp312-cp312-manylinux_2_28_x86_64.whl (7.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

mehari-0.43.2-cp311-cp311-manylinux_2_28_x86_64.whl (7.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

File details

Details for the file mehari-0.43.2.tar.gz.

File metadata

  • Download URL: mehari-0.43.2.tar.gz
  • Upload date:
  • Size: 600.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for mehari-0.43.2.tar.gz
Algorithm Hash digest
SHA256 9095c87edfffc88ce1294b5e6a28bdd8e1fbfe45f59e1621c7a3cbe323dca6d9
MD5 3b3968bba5e60da3abba4b0b71b2a27f
BLAKE2b-256 52ca717afcc5b44ca5ee756d09f396a4b59f19982226ca45561af5a764be43df

See more details on using hashes here.

File details

Details for the file mehari-0.43.2-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mehari-0.43.2-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 405a369a0fd1060715dfd5d6920dc4771f66b73543a550afd51a733844238369
MD5 d8e6a4dd1e7f2197d07df383c39d9842
BLAKE2b-256 23d7a46101c418b3ad223eb427d0c800a02c5648a24f56bc4de5215d8e5232f6

See more details on using hashes here.

File details

Details for the file mehari-0.43.2-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mehari-0.43.2-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 308279e571911de049f397c75f10863171735a739b97c354c8f14c1d5b7f3872
MD5 ed9182a9c035691aadb56a323916ce9e
BLAKE2b-256 da42f3e38af74012e1b9dcb494df5c41dcece5188aad23a8c367b56614d6d98c

See more details on using hashes here.

File details

Details for the file mehari-0.43.2-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mehari-0.43.2-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4cac47ee8c22f058c01e5c9eb4c0817cea973c5a4ebde2daf0d0d29aab9d0e3a
MD5 5fccc031d823d84ded9834c1ba71e425
BLAKE2b-256 2ae1312da24cca8ffdbeb8c8661acb58c05c27e2194b03dced7f12ef6f4826a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page