Pandas utilities for tab-delimited and other genomic files
Bioframe: Operations on Genomic Interval Dataframes
Bioframe is a library to enable flexible and scalable operations on genomic interval dataframes in python. Building bioframe directly on top of pandas enables immediate access to a rich set of dataframe operations. Working in python enables rapid visualization (e.g. matplotlib, seaborn) and iteration of genomic analyses.
The philosophy underlying bioframe is to enable flexible operations: instead of creating a function for every possible use-case, we instead encourage users to compose functions to achieve their goals. As a rough rule of thumb, if a function requires three steps and is crucial for genomic interval arithmetic we have included it; conversely if it can be performed in a single line by composing two of the core functions, we have not included it.
Bioframe implements a variety of genomic interval operations directly on dataframes. Bioframe also has functions for loading diverse genomic data formats, and performing operations on special classes of genomic intervals, including chromosome arms and fixed size bins.
The following are required before installing bioframe:
- Python 3.7+
pip install bioframe
Key genomic interval operations in bioframe include:
closest: For every interval in a dataframe, find the closest intervals in a second dataframe.
cluster: Group overlapping intervals in a dataframe into clusters.
complement: Find genomic intervals that are not covered by any interval from a dataframe.
overlap: Find pairs of overlapping genomic intervals between two dataframes.
Bioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including:
overlap two dataframes, call:
import bioframe as bf bf.overlap(df1, df2)
For these two input dataframes, with intervals all on the same chromosome:
overlap will return the following interval pairs as overlaps:
merge all overlapping intervals in a dataframe, call:
import bioframe as bf bf.merge(df1)
For this input dataframe, with intervals all on the same chromosome:
merge will return a new dataframe with these merged intervals:
See the guide for visualizations of other interval operations in bioframe.
Bioframe includes utilities for reading genomic file formats into dataframes and vice versa. One handy function is
read_table which mirrors pandas’s read_csv/read_table but provides a
schema argument to populate column names for common tabular file formats.
jaspar_url = 'http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2018/hg38/tsv/MA0139.1.tsv.gz' ctcf_motif_calls = bioframe.read_table(jaspar_url, schema='jaspar', skiprows=1)
See this jupyter notebook for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.
Projects currently using bioframe:
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size bioframe-0.3.0-py2.py3-none-any.whl (56.9 kB)||File type Wheel||Python version py2.py3||Upload date||Hashes View|
|Filename, size bioframe-0.3.0.tar.gz (51.7 kB)||File type Source||Python version None||Upload date||Hashes View|
Hashes for bioframe-0.3.0-py2.py3-none-any.whl