Skip to main content

Operations and utilities for Genomic Interval Dataframes.

Project description

Bioframe: Operations on Genomic Interval Dataframes

CI pre-commit.ci status Docs status Paper Zenodo Slack NumFOCUS

Bioframe enables flexible and scalable operations on genomic interval dataframes in Python.

Bioframe is built directly on top of Pandas. Bioframe provides:

  • A variety of genomic interval operations that work directly on dataframes.
  • Operations for special classes of genomic intervals, including chromosome arms and fixed-size bins.
  • Conveniences for diverse tabular genomic data formats and loading genome assembly summary information.

Read the documentation, including the guide, as well as the publication for more information.

Bioframe is an Affiliated Project of NumFOCUS.

Installation

Bioframe is available on PyPI and bioconda:

pip install bioframe

Contributing

Interested in contributing to bioframe? That's great! To get started, check out the contributing guide. Discussions about the project roadmap take place on the Open2C Slack and regular developer meetings scheduled there. Anyone can join and participate!

Interval operations

Key genomic interval operations in bioframe include:

  • overlap: Find pairs of overlapping genomic intervals between two dataframes.
  • closest: For every interval in a dataframe, find the closest intervals in a second dataframe.
  • cluster: Group overlapping intervals in a dataframe into clusters.
  • complement: Find genomic intervals that are not covered by any interval from a dataframe.

Bioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including: coverage, expand, merge, select, and subtract.

To overlap two dataframes, call:

import bioframe as bf

bf.overlap(df1, df2)

For these two input dataframes, with intervals all on the same chromosome:

overlap will return the following interval pairs as overlaps:

To merge all overlapping intervals in a dataframe, call:

import bioframe as bf

bf.merge(df1)

For this input dataframe, with intervals all on the same chromosome:

merge will return a new dataframe with these merged intervals:

See the guide for visualizations of other interval operations in bioframe.

File I/O

Bioframe includes utilities for reading genomic file formats into dataframes and vice versa. One handy function is read_table which mirrors pandas’s read_csv/read_table but provides a schema argument to populate column names for common tabular file formats.

jaspar_url = 'http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/hg38/MA0139.1.tsv.gz'
ctcf_motif_calls = bioframe.read_table(jaspar_url, schema='jaspar', skiprows=1)

Tutorials

See this jupyter notebook for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.

Citing

If you use bioframe in your work, please cite:

@article{bioframe_2024,
author = {Open2C and Abdennur, Nezar and Fudenberg, Geoffrey and Flyamer, Ilya M and Galitsyna, Aleksandra A and Goloborodko, Anton and Imakaev, Maxim and Venev, Sergey},
doi = {10.1093/bioinformatics/btae088},
journal = {Bioinformatics},
title = {{Bioframe: Operations on Genomic Intervals in Pandas Dataframes}},
year = {2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioframe-0.8.0.tar.gz (965.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bioframe-0.8.0-py3-none-any.whl (153.3 kB view details)

Uploaded Python 3

File details

Details for the file bioframe-0.8.0.tar.gz.

File metadata

  • Download URL: bioframe-0.8.0.tar.gz
  • Upload date:
  • Size: 965.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for bioframe-0.8.0.tar.gz
Algorithm Hash digest
SHA256 ea87d7c3f26235bf35396d41bbdcddd21fe7434f7406f8182a67aad241b96fa2
MD5 0f54a41a551f3112be2e7d77c7dd7011
BLAKE2b-256 3a0e69fe99dc5d8084eb6717e0c69dbd2dc095373053ec47415a105638cb1127

See more details on using hashes here.

Provenance

The following attestation bundles were made for bioframe-0.8.0.tar.gz:

Publisher: publish.yml on open2c/bioframe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bioframe-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: bioframe-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 153.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for bioframe-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5db243f74cd2deafbf3b664e0cfd6e9d5e736e01143747aa52d99409c8315574
MD5 2ba6e2476d179581e8f6092dc96af27c
BLAKE2b-256 d46641b2bce9de683b3bcac510b6e97682946f403e0553ae5cf8dfc267bb26b8

See more details on using hashes here.

Provenance

The following attestation bundles were made for bioframe-0.8.0-py3-none-any.whl:

Publisher: publish.yml on open2c/bioframe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page