Skip to main content

Blazing fast genomic operations on large Python dataframes

Project description

polars-bio - Next-gen Python DataFrame operations for genomics!

PyPI - Version GitHub License PyPI - Downloads GitHub commit activity

CI Docs logo

polars-bio is a Python library for genomics built on top of polars, Apache Arrow and Apache DataFusion. It provides a DataFrame API for genomics data and is designed to be blazing fast, memory efficient and easy to use.

img.png

Key Features

Performance benchmarks

summary-results.png

For developers: See benchmarks/README_BENCHMARKS.md for information about running performance benchmarks via GitHub Actions.

Citing

If you use polars-bio in your work, please cite:

@article{10.1093/bioinformatics/btaf640,
    author = {Wiewiórka, Marek and Khamutou, Pavel and Zbysiński, Marek and Gambin, Tomasz},
    title = {polars-bio—fast, scalable and out-of-core operations on large genomic interval datasets},
    journal = {Bioinformatics},
    pages = {btaf640},
    year = {2025},
    month = {12},
    abstract = {Genomic studies very often rely on computationally intensive analyses of relationships between features, which are typically represented as intervals along a one-dimensional coordinate system (such as positions on a chromosome). In this context, the Python programming language is extensively used for manipulating and analyzing data stored in a tabular form of rows and columns, called a DataFrame. Pandas is the most widely used Python DataFrame package and has been criticized for inefficiencies and scalability issues, which its modern alternative—Polars—aims to address with a native backend written in the Rust programming language.polars-bio is a Python library that enables fast, parallel and out-of-core operations on large genomic interval datasets. Its main components are implemented in Rust, using the Apache DataFusion query engine and Apache Arrow for efficient data representation. It is compatible with Polars and Pandas DataFrame formats. In a real-world comparison (107 vs. 1.2×106 intervals), our library runs overlap queries 6.5x, nearest queries 15.5x, count\_overlaps queries 38x, and coverage queries 15x faster than Bioframe. On equally-sized synthetic sets (107 vs. 107), the corresponding speedups are 1.6x, 5.5x, 6x, and 6x. In streaming mode, on real and synthetic interval pairs, our implementation uses 90x and 15x less memory for overlap, 4.5x and 6.5x less for nearest, 60x and 12x less for count\_overlaps, and 34x and 7x less for coverage than Bioframe. Multi-threaded benchmarks show good scalability characteristics. To the best of our knowledge, polars-bio is the most efficient single-node library for genomic interval DataFrames in Python.polars-bio is an open-source Python package distributed under the Apache License available for major platforms, including Linux, macOS, and Windows in the PyPI registry. The online documentation is https://biodatageeks.org/polars-bio/ and the source code is available on GitHub: https://github.com/biodatageeks/polars-bio and Zenodo: https://doi.org/10.5281/zenodo.16374290. Supplementary Materials are available at Bioinformatics online.},
    issn = {1367-4811},
    doi = {10.1093/bioinformatics/btaf640},
    url = {https://doi.org/10.1093/bioinformatics/btaf640},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaf640/65667510/btaf640.pdf},
}

References

VCF Zarr support in polars-bio builds on:

Read the documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_bio-0.31.0.tar.gz (42.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_bio-0.31.0-cp39-abi3-win_amd64.whl (67.2 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_bio-0.31.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (75.3 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_bio-0.31.0-cp39-abi3-macosx_11_0_arm64.whl (67.7 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_bio-0.31.0-cp39-abi3-macosx_10_12_x86_64.whl (71.1 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file polars_bio-0.31.0.tar.gz.

File metadata

  • Download URL: polars_bio-0.31.0.tar.gz
  • Upload date:
  • Size: 42.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.13.3

File hashes

Hashes for polars_bio-0.31.0.tar.gz
Algorithm Hash digest
SHA256 ae3a5c8a48a92d7d1711dc44e07ad1cb1b04c48942f454ad1799910745bab588
MD5 f1fab96f6c1bc79e36b4dae24363448d
BLAKE2b-256 0185bc8b40015a750a3b4eccc427c5ca4d30860ab5ac01dc6df5296d3100601a

See more details on using hashes here.

File details

Details for the file polars_bio-0.31.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_bio-0.31.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 8ec9fafca5ffc9fabf58d6c842f055a0b5248206ed0ec885ee77163c25c532a9
MD5 7a463ea17dc3c57c758548a218fd6c7d
BLAKE2b-256 aa3d742bdd2432108bf887ea69c1950c9380618d40d773330e6a20fa33484cd4

See more details on using hashes here.

File details

Details for the file polars_bio-0.31.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_bio-0.31.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 260b341544d272bd007f8fac0c3a0fd270e669801375e05bc7e8d429d4c5094c
MD5 f27b5a0dc592b4a7f1894f24b9408f7d
BLAKE2b-256 60ecca6bff8878beae865055926c6e0f7e1d8a758c413e68479ac907a6a104fc

See more details on using hashes here.

File details

Details for the file polars_bio-0.31.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_bio-0.31.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 aee5a83e2f5cf9a83e3160ba822e54ccdf7dede38d791903ad66cc3845a3d2b5
MD5 e3f95a7cd420c98886becaf79a932b0a
BLAKE2b-256 62f39827ec08c7605e07d0bc8569b3d2422e2213062d2b7f720dd631688b528d

See more details on using hashes here.

File details

Details for the file polars_bio-0.31.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_bio-0.31.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 1840dcdd368931740a73df89bf089ccec832416cc89e583efbe263cc73f2e5d4
MD5 4b38f6d8b65bfa7f65faac5c5847566b
BLAKE2b-256 5b997745753548257ee5d08c9e37fdcd84421e5dbc72a658cfcc2670ab5f751f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page