Skip to main content

A Python wrapper for minimap2-rs

Project description

Python bindings for the Rust FFI minimap2 library. In development! Feedback appreciated!

Why?

PyO3 makes it very easy to create Python libraries via Rust. Further, we can use Polars to export results as a dataframe (which can be used as-is, or converted to Pandas). Python allows for faster experimentation with novel algorithms, integration into machine learning pipelines, and provides an opportunity for those not familiar with Rust nor C/C++ to use minimap2.

Current State

Very early alpha. Please use, and open an issue for any features you need that are missing, and for any bugs you find.

How to use

Requirements

Polars and PyArrow, these should be installed when you install minimappers2

Creating an Aligner Instance

aligner = map_ont()
aligner.threads(4)

If you want an alignment performed, rather than just matches, enable .cigar()

aligner = map_hifi()
aligner.cigar()

Please note, at this time the following syntax is NOT supported:

aligner = map_ont().threads(4).cigar()

Creating an index

aligner.index("ref.fa")

To save a built-index, for future processing use:

aligner.index_and_save("ref.fa", "ref.mmi")

Then next time you use the index will be faster if you use the saved index instead.

aligner.load_index("ref.mmi")

Aligning a Single Sequence

query = Sequence(seq_name, seq)
aligner.map1(query)

# Example
seq = "CCAGAACGTACAAGGAAATATCCTCAAATTATCCCAAGAATTGTCCGCAGGAAATGGGGATAATTTCAGAAATGAGAG"
result = aligner.map1(Sequence("MySeq", seq))

Where seq_name and seq are both strings. The output is a Polars DataFrame.

Aligning Multiple Sequences

seqs = [Sequence("name of seq 1", seq1), 
        Sequence("name of seq 2", seq1)]
result = aligner.map(seqs)

Example Notebook

Please see the example notebook for more examples.

Mapping a file

Please open an issue if you need to map files from this API.

Results

All results are returned as Polars dataframes. You can convert Polars dataframes to Pandas dataframes with .to_pandas()

  • Polars is the fastest dataframe library in the Python Ecosystem.
  • Polars provides a nice data bridge between Rust and Python.

For more information, please see the Polars User Guide or the Polars Guide for Pandas users.

Example of Results

Here is an image of the resulting dataframe Resulting Dataframe Image

NOTE Mapq, Cigar, and others will not show up unless .cigar() is enabled on the aligner itself.

Errors

As this is a very-early stage library, error checking is not yet implemented. When things crash you will likely need to restart your python interpreter (jupyter kernel). Let me know what happened and open an issue and I will get to it.

Compatability

  • Windows: Unlikely

  • Linux: Likely

  • Mac: Unknown

  • x86_64: Likely

  • aarch64: Unknown

  • neon: No (Open an issue)

  • Google Colab: No, not sure why though.

Performance

Effort has been made to make this as performant as possible, but if you need more performance, please use minimap2 directly and import the results.

Citation

You should cite the minimap2 papers if you use this in your work.

Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. [doi:10.1093/bioinformatics/bty191][doi]

and/or:

Li, H. (2021). New strategies to improve minimap2 alignment accuracy. Bioinformatics, 37:4572-4574. [doi:10.1093/bioinformatics/btab705][doi2]

Changelog

0.1.0

  • Initial Functions implemented
  • Return results as Polars dfs

Funding

Genomics Aotearoa

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minimappers2-0.1.3.tar.gz (158.6 kB view details)

Uploaded Source

Built Distribution

minimappers2-0.1.3-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ x86-64

File details

Details for the file minimappers2-0.1.3.tar.gz.

File metadata

  • Download URL: minimappers2-0.1.3.tar.gz
  • Upload date:
  • Size: 158.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/0.14.10

File hashes

Hashes for minimappers2-0.1.3.tar.gz
Algorithm Hash digest
SHA256 00567c75244ad1c9d2e280eb6c46b8fcedd07460ee7f71735f0d3d61171d43c4
MD5 74d6347479304617faf7e69590d39150
BLAKE2b-256 6755225ee844665e2f3e7bdb4a2517f05b2d0bb9c7007f1f1dd88ce5a7037637

See more details on using hashes here.

File details

Details for the file minimappers2-0.1.3-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for minimappers2-0.1.3-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 859c45a80356870d61728e5da4d908409534bb5446631ea82b07188e88f2fa5d
MD5 9482d4626b68b47f83672d8c3fe10d4b
BLAKE2b-256 6cb2953f1de1412cb6e2e98f2cc97200d05455919874794a31281cf2b3fbe4d7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page