Skip to main content

Blazing fast genomic operations on large Python dataframes

Project description

polars-bio - Next-gen Python DataFrame operations for genomics!

PyPI - Version GitHub License PyPI - Downloads GitHub commit activity

CI Docs logo

polars-bio is a Python library for genomics built on top of polars, Apache Arrow and Apache DataFusion. It provides a DataFrame API for genomics data and is designed to be blazing fast, memory efficient and easy to use.

img.png

Key Features

Performance benchmarks

summary-results.png

For developers: See benchmarks/README_BENCHMARKS.md for information about running performance benchmarks via GitHub Actions.

Citing

If you use polars-bio in your work, please cite:

@article{10.1093/bioinformatics/btaf640,
    author = {Wiewiórka, Marek and Khamutou, Pavel and Zbysiński, Marek and Gambin, Tomasz},
    title = {polars-bio—fast, scalable and out-of-core operations on large genomic interval datasets},
    journal = {Bioinformatics},
    pages = {btaf640},
    year = {2025},
    month = {12},
    abstract = {Genomic studies very often rely on computationally intensive analyses of relationships between features, which are typically represented as intervals along a one-dimensional coordinate system (such as positions on a chromosome). In this context, the Python programming language is extensively used for manipulating and analyzing data stored in a tabular form of rows and columns, called a DataFrame. Pandas is the most widely used Python DataFrame package and has been criticized for inefficiencies and scalability issues, which its modern alternative—Polars—aims to address with a native backend written in the Rust programming language.polars-bio is a Python library that enables fast, parallel and out-of-core operations on large genomic interval datasets. Its main components are implemented in Rust, using the Apache DataFusion query engine and Apache Arrow for efficient data representation. It is compatible with Polars and Pandas DataFrame formats. In a real-world comparison (107 vs. 1.2×106 intervals), our library runs overlap queries 6.5x, nearest queries 15.5x, count\_overlaps queries 38x, and coverage queries 15x faster than Bioframe. On equally-sized synthetic sets (107 vs. 107), the corresponding speedups are 1.6x, 5.5x, 6x, and 6x. In streaming mode, on real and synthetic interval pairs, our implementation uses 90x and 15x less memory for overlap, 4.5x and 6.5x less for nearest, 60x and 12x less for count\_overlaps, and 34x and 7x less for coverage than Bioframe. Multi-threaded benchmarks show good scalability characteristics. To the best of our knowledge, polars-bio is the most efficient single-node library for genomic interval DataFrames in Python.polars-bio is an open-source Python package distributed under the Apache License available for major platforms, including Linux, macOS, and Windows in the PyPI registry. The online documentation is https://biodatageeks.org/polars-bio/ and the source code is available on GitHub: https://github.com/biodatageeks/polars-bio and Zenodo: https://doi.org/10.5281/zenodo.16374290. Supplementary Materials are available at Bioinformatics online.},
    issn = {1367-4811},
    doi = {10.1093/bioinformatics/btaf640},
    url = {https://doi.org/10.1093/bioinformatics/btaf640},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaf640/65667510/btaf640.pdf},
}

Read the documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_bio-0.30.0.tar.gz (42.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_bio-0.30.0-cp39-abi3-win_amd64.whl (65.5 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_bio-0.30.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (73.2 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_bio-0.30.0-cp39-abi3-macosx_11_0_arm64.whl (65.7 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_bio-0.30.0-cp39-abi3-macosx_10_12_x86_64.whl (69.1 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file polars_bio-0.30.0.tar.gz.

File metadata

  • Download URL: polars_bio-0.30.0.tar.gz
  • Upload date:
  • Size: 42.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for polars_bio-0.30.0.tar.gz
Algorithm Hash digest
SHA256 977c954f1c1cdbf3f5a68f46921b8a29c110533f539e192142554b867a855dc7
MD5 ed8992734bcc48aa6ecc12bfa9154210
BLAKE2b-256 feb56f823bd00fdf205d68e4f01c91c681bbebc2f7c792b46fce50cf80bdc908

See more details on using hashes here.

File details

Details for the file polars_bio-0.30.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_bio-0.30.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 b5dc21ba63a3de23fb3c32806879962d32c7f477da61ce7f97abb40e656a9ed5
MD5 0de51fdbb05a50b5ee9e3fa141e9e838
BLAKE2b-256 813508103c907bc7b65db818d0c0c4ed2621a1a1702d3ea562d08693d98c0993

See more details on using hashes here.

File details

Details for the file polars_bio-0.30.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_bio-0.30.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ce750798a47637fd64d419f22b35a727dab30d5ec543f7ff44b5c175a4df6d19
MD5 c3a7c8303258b95f8415cfb00502070e
BLAKE2b-256 4506ea143441b449c170fb154c9917041e90a8733c8324ef1bfff01477e3bae9

See more details on using hashes here.

File details

Details for the file polars_bio-0.30.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_bio-0.30.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 713cffd262dbff5ea305e40b672be423dc3713d67ed5c02fffa110fbe085e745
MD5 351fd4489a2c3b722b069c637d198edf
BLAKE2b-256 8a5817ca14fd986f250b501892d0d9c4ae26750541d2aadd6f1af2502c2ec0eb

See more details on using hashes here.

File details

Details for the file polars_bio-0.30.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_bio-0.30.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 2408fc90e75412ef7aaa9d53e22fa54ca1ec9fd39c8ec62745bbc0715805ee5d
MD5 939e4b329b2b9ddc622bc05260f89e6d
BLAKE2b-256 d9e455575921a9fa3f98e5f6eb1ed2963e65834dea4e227c99bd9c979a33b31e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page