Python bindings for the mehari variant annotator
Project description
mehari
Python bindings for the mehari Rust library.
Features
- Single variants: Annotate a single variant using a format string (
chr:pos:ref:alt) or keyword arguments. - Multiple variants (experimental): Evaluate the compound effect of multiple variants.
- DataFrames: Process batches of variants by passing a
polars.DataFrame. - LazyFrames: Support for
polars.LazyFrameto process large datasets (like Parquet files) without loading everything into memory.
Usage
Initialize SeqvarsAnnotator with your transcript database (see
mehari-data-tx) and a reference genome (FASTA, uncompressed,
with index).
from mehari import SeqvarsAnnotator
annotator = SeqvarsAnnotator(
transcript_db_paths=["path/to/txs.bin.zst"],
reference_path="path/to/reference.fa"
)
To annotate a single variant either use colon separated format string or keyword arguments:
result1 = annotator.annotate("17:41197701:G:C")
result2 = annotator.annotate(chromosome="3", position=193332511, reference="G", alternative="T")
To annotate multiple phased variants together as a single compound event (Experimental):
Note: Mehari does not infer phasing. When using
annotate_multiple, mehari assumes all provided variants are on the same chromosome, exist on the same haplotype, and do not overlap.
result1 = annotator.annotate_multiple(["1:37799635:TA:A", "1:37799639:C:CG"])
result2 = annotator.annotate_multiple([
{"chromosome": "1", "position": 37799635, "reference": "TA", "alternative": "A"},
{"chromosome": "1", "position": 37799639, "reference": "C", "alternative": "CG"}
])
To annotate a batch of variants, pass a polars.DataFrame or polars.LazyFrame.
import polars as pl
df = pl.DataFrame(
{
"chromosome": ["17", "3"],
"position": [41197701, 193332511],
"reference": ["G", "G"],
"alternative": ["C", "T"],
},
schema={
"chromosome": pl.Categorical, "position": pl.Int32,
"reference": pl.String, "alternative": pl.String
}
)
annotated_df = annotator.annotate(df)
Schemas and types
Enums
Mehari exports its internal enums to Python so you can use them for filtering or comparisons:
from mehari import ConsequenceEnum, ImpactEnum
DataFrame Schema
When annotating a DataFrame or LazyFrame, mehari appends an "annotation" column.
This column is a polars List(Struct) with the following fields:
allele:Stringconsequences:List(ConsequenceEnum)putative_impact:ImpactEnumgene_symbol:Stringgene_id:Stringfeature_type:Stringfeature_id:Stringfeature_biotype:List(String)feature_tags:List(String)rank:Struct(ord: Int32, total: Int32)cdna_pos:Struct(ord: Int32, total: Int32)cds_pos:Struct(ord: Int32, total: Int32)protein_pos:Struct(ord: Int32, total: Int32)hgvs_g:Stringhgvs_n:Stringhgvs_c:Stringhgvs_p:Stringdistance:Int32strand:Int32messages:List(String)
Building a transcript database
To build a transcript database, you can use the build_transcript_db function:
from mehari import build_transcript_db
build_transcript_db(
assembly="grch38",
annotation=["grch38.gff.gz"],
transcript_sequences="grch38.fasta",
transcript_source="ensembl",
output="grch38.bin.zst"
)
This may take a while (several minutes for GRCh38 + Ensembl).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mehari-0.43.2.tar.gz.
File metadata
- Download URL: mehari-0.43.2.tar.gz
- Upload date:
- Size: 600.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9095c87edfffc88ce1294b5e6a28bdd8e1fbfe45f59e1621c7a3cbe323dca6d9
|
|
| MD5 |
3b3968bba5e60da3abba4b0b71b2a27f
|
|
| BLAKE2b-256 |
52ca717afcc5b44ca5ee756d09f396a4b59f19982226ca45561af5a764be43df
|
File details
Details for the file mehari-0.43.2-cp313-cp313-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: mehari-0.43.2-cp313-cp313-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 7.8 MB
- Tags: CPython 3.13, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
405a369a0fd1060715dfd5d6920dc4771f66b73543a550afd51a733844238369
|
|
| MD5 |
d8e6a4dd1e7f2197d07df383c39d9842
|
|
| BLAKE2b-256 |
23d7a46101c418b3ad223eb427d0c800a02c5648a24f56bc4de5215d8e5232f6
|
File details
Details for the file mehari-0.43.2-cp312-cp312-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: mehari-0.43.2-cp312-cp312-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 7.8 MB
- Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
308279e571911de049f397c75f10863171735a739b97c354c8f14c1d5b7f3872
|
|
| MD5 |
ed9182a9c035691aadb56a323916ce9e
|
|
| BLAKE2b-256 |
da42f3e38af74012e1b9dcb494df5c41dcece5188aad23a8c367b56614d6d98c
|
File details
Details for the file mehari-0.43.2-cp311-cp311-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: mehari-0.43.2-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 7.8 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4cac47ee8c22f058c01e5c9eb4c0817cea973c5a4ebde2daf0d0d29aab9d0e3a
|
|
| MD5 |
5fccc031d823d84ded9834c1ba71e425
|
|
| BLAKE2b-256 |
2ae1312da24cca8ffdbeb8c8661acb58c05c27e2194b03dced7f12ef6f4826a4
|