Skip to main content

Python bindings for the mehari variant annotator

Project description

mehari

Python bindings for the mehari Rust library.

Features

  • Single variants: Annotate a single variant using a format string (chr:pos:ref:alt) or keyword arguments.
  • Multiple variants (experimental): Evaluate the compound effect of multiple variants.
  • DataFrames: Process batches of variants by passing a polars.DataFrame.
  • LazyFrames: Support for polars.LazyFrame to process large datasets (like Parquet files) without loading everything into memory.

Usage

Initialize SeqvarsAnnotator with your transcript database (see mehari-data-tx) and a reference genome (FASTA, uncompressed, with index).

from mehari import SeqvarsAnnotator

annotator = SeqvarsAnnotator(
    transcript_db_paths=["path/to/txs.bin.zst"],
    reference_path="path/to/reference.fa"
)

To annotate a single variant either use colon separated format string or keyword arguments:

result1 = annotator.annotate("17:41197701:G:C")
result2 = annotator.annotate(chromosome="3", position=193332511, reference="G", alternative="T")

To annotate multiple phased variants together as a single compound event (Experimental):

Note: Mehari does not infer phasing. When using annotate_multiple, mehari assumes all provided variants are on the same chromosome, exist on the same haplotype, and do not overlap.

result1 = annotator.annotate_multiple(["1:37799635:TA:A", "1:37799639:C:CG"])

result2 = annotator.annotate_multiple([
    {"chromosome": "1", "position": 37799635, "reference": "TA", "alternative": "A"},
    {"chromosome": "1", "position": 37799639, "reference": "C", "alternative": "CG"}
])

To annotate a batch of variants, pass a polars.DataFrame or polars.LazyFrame.

import polars as pl

df = pl.DataFrame(
    {
        "chromosome": ["17", "3"],
        "position": [41197701, 193332511],
        "reference": ["G", "G"],
        "alternative": ["C", "T"],
    },
    schema={
        "chromosome": pl.Categorical, "position": pl.Int32,
        "reference": pl.String, "alternative": pl.String
    }
)

annotated_df = annotator.annotate(df)

Schemas and types

Enums

Mehari exports its internal enums to Python so you can use them for filtering or comparisons:

from mehari import ConsequenceEnum, ImpactEnum

DataFrame Schema

When annotating a DataFrame or LazyFrame, mehari appends an "annotation" column. This column is a polars List(Struct) with the following fields:

  • allele: String
  • consequences: List(ConsequenceEnum)
  • putative_impact: ImpactEnum
  • gene_symbol: String
  • gene_id: String
  • feature_type: String
  • feature_id: String
  • feature_biotype: List(String)
  • feature_tags: List(String)
  • rank: Struct(ord: Int32, total: Int32)
  • cdna_pos: Struct(ord: Int32, total: Int32)
  • cds_pos: Struct(ord: Int32, total: Int32)
  • protein_pos: Struct(ord: Int32, total: Int32)
  • hgvs_g: String
  • hgvs_n: String
  • hgvs_c: String
  • hgvs_p: String
  • distance: Int32
  • strand: Int32
  • messages: List(String)

Building a transcript database

To build a transcript database, you can use the build_transcript_db function:

from mehari import build_transcript_db
build_transcript_db(
    assembly="grch38",
    annotation=["grch38.gff.gz"],
    transcript_sequences="grch38.fasta",
    transcript_source="ensembl",
    output="grch38.bin.zst"
)

This may take a while (several minutes for GRCh38 + Ensembl).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mehari-0.42.1.tar.gz (599.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mehari-0.42.1-cp313-cp313-manylinux_2_28_x86_64.whl (7.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

mehari-0.42.1-cp312-cp312-manylinux_2_28_x86_64.whl (7.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

mehari-0.42.1-cp311-cp311-manylinux_2_28_x86_64.whl (7.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

File details

Details for the file mehari-0.42.1.tar.gz.

File metadata

  • Download URL: mehari-0.42.1.tar.gz
  • Upload date:
  • Size: 599.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for mehari-0.42.1.tar.gz
Algorithm Hash digest
SHA256 188921c5ffab9ce6e4dcc01af11091b1c1524f6e53dbedbc46243ce39c1b8cab
MD5 9c758ca76fab31028f9132c80713aec9
BLAKE2b-256 12d40b128bba501fbf4329c078f511f66eff30e7154051a88de975b7f88e410a

See more details on using hashes here.

File details

Details for the file mehari-0.42.1-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mehari-0.42.1-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7bd1c736e7ae44c50d5b82991181333a2264b4d9395c3b04625da15fb6a311be
MD5 4c60f6b15d773a5558b3bae3787972c2
BLAKE2b-256 b54788eabfa2168c3674e5d4791d24e89d059b6952bc0d37abf769ab7d99b3b1

See more details on using hashes here.

File details

Details for the file mehari-0.42.1-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mehari-0.42.1-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b82a897e69208eec99817b5d5647947bf3df2859789ebe9435ce501722c2551a
MD5 3eff7eff0e3d9a243491a229b7513dbe
BLAKE2b-256 f66779f5f19bfb1617e6b71f7b8c64aa1c3c072f42c3718d7eead1f9d4154596

See more details on using hashes here.

File details

Details for the file mehari-0.42.1-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mehari-0.42.1-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ae1036cf38441c9498d3560e2422099c1d288be3440dcc17b15d4e85e44130ef
MD5 3ff40926b7a392b279132c0f7bd29510
BLAKE2b-256 c0a17297b9e766f92e4d2bd2384a40f62f91b20ca5e7b5ad50933f7d28d5479b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page