Skip to main content

Python bindings for the mehari variant annotator

Project description

mehari

Python bindings for the mehari Rust library.

Features

  • Single variants: Annotate a single variant using a format string (chr:pos:ref:alt) or keyword arguments.
  • DataFrames: Process batches of variants by passing a polars.DataFrame.
  • LazyFrames: Support for polars.LazyFrame to process large datasets (like Parquet files) without loading everything into memory.

Usage

Initialize SeqvarsAnnotator with your transcript database (see mehari-data-tx) and a reference genome (FASTA, uncompressed, with index).

from mehari import SeqvarsAnnotator

annotator = SeqvarsAnnotator(
    transcript_db_paths=["path/to/txs.bin.zst"],
    reference_path="path/to/reference.fa"
)

To annotate a single variant either use colon separated format string or keyword arguments:

result1 = annotator.annotate("17:41197701:G:C")
result2 = annotator.annotate(chromosome="3", position=193332511, reference="G", alternative="T")

To annotate a batch of variants, pass a polars.DataFrame or polars.LazyFrame.

import polars as pl

df = pl.DataFrame(
    {
        "chromosome": ["17", "3"],
        "position": [41197701, 193332511],
        "reference": ["G", "G"],
        "alternative": ["C", "T"],
    },
    schema={
        "chromosome": pl.Categorical, "position": pl.Int32,
        "reference": pl.String, "alternative": pl.String
    }
)

annotated_df = annotator.annotate(df)

Schemas and types

Enums

Mehari exports its internal enums to Python so you can use them for filtering or comparisons:

from mehari import ConsequenceEnum, ImpactEnum

DataFrame Schema

When annotating a DataFrame or LazyFrame, mehari appends an "annotation" column. This column is a polars List(Struct) with the following fields:

  • allele: String
  • consequences: List(ConsequenceEnum)
  • putative_impact: ImpactEnum
  • gene_symbol: String
  • gene_id: String
  • feature_type: String
  • feature_id: String
  • feature_biotype: List(String)
  • feature_tags: List(String)
  • rank: Struct(ord: Int32, total: Int32)
  • cdna_pos: Struct(ord: Int32, total: Int32)
  • cds_pos: Struct(ord: Int32, total: Int32)
  • protein_pos: Struct(ord: Int32, total: Int32)
  • hgvs_g: String
  • hgvs_n: String
  • hgvs_c: String
  • hgvs_p: String
  • distance: Int32
  • strand: Int32
  • messages: List(String)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mehari-0.41.0.tar.gz (585.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mehari-0.41.0-cp312-cp312-manylinux_2_28_x86_64.whl (6.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

mehari-0.41.0-cp311-cp311-manylinux_2_28_x86_64.whl (6.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

File details

Details for the file mehari-0.41.0.tar.gz.

File metadata

  • Download URL: mehari-0.41.0.tar.gz
  • Upload date:
  • Size: 585.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for mehari-0.41.0.tar.gz
Algorithm Hash digest
SHA256 6b96b5d1f801d0345a3cf4b165cab584261a07bce31bab7dcb72c4eb083d936d
MD5 16d4a11a081d6c2bcaee5e71ad0f48d4
BLAKE2b-256 d92ce65979df772501b459d31895ebede358ac090a80cf15db9afe7b50f5d233

See more details on using hashes here.

File details

Details for the file mehari-0.41.0-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mehari-0.41.0-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f8f73618466b8c6d12267288d1a738e10ba9da46dd70ae60243370bc68a757e4
MD5 e6a378c3998c5dcbb254ccd141826f3e
BLAKE2b-256 d0e7598841f283842aeb33e0ce29ff382fcff8f04c4e664b868b2140f792e428

See more details on using hashes here.

File details

Details for the file mehari-0.41.0-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mehari-0.41.0-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 392da3abff15e37776b962253abd8adceb9e8c32de789e5b255a960b983436e6
MD5 f261f54daafd76965c1ebbaca82f5d81
BLAKE2b-256 37c628bf8f7d12454d7560f2ddecdb46497723dfb613f29c29d8068aa89c56c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page