Container class to represent genomic locations and support genomic analysis.
Project description
GenomicRanges
GenomicRanges is a Python container class designed to represent genomic locations and support genomic analysis. It is similar to Bioconductor's GenomicRanges.
Install
Package is published to PyPI
pip install genomicranges
Usage
The package provides several ways to represent genomic annotations and intervals.
Initialize a GenomicRanges
object
From UCSC or GTF file
You can easily access UCSC genomes or load a genome annotation from a GTF file using the following methods:
import genomicranges
gr = genomicranges.from_gtf(<PATH TO GTF>)
# OR
gr = genomicranges.from_ucsc(genome="hg19")
Pandas DataFrame
A common representation in Python is a pandas DataFrame for all tabular datasets. You can convert a DataFrame into a GenomicRanges
object. Please note that intervals are inclusive on both ends, and your DataFrame must contain columns seqnames, starts, and ends to represent genomic coordinates.
Here's an example:
import genomicranges
import pandas as pd
df = pd.DataFrame(
{
"seqnames": ["chr1", "chr2", "chr1", "chr3", "chr2"],
"starts": [101, 102, 103, 104, 109],
"ends": [112, 103, 128, 134, 111],
"strand": ["*", "-", "*", "+", "-"],
"score": range(0, 5),
"GC": [random() for _ in range(5)],
}
)
gr = genomicranges.from_pandas(df)
Interval Operations
GenomicRanges currently supports most commonly used interval based operations.
subject = genomicranges.from_ucsc(genome="hg38")
query = genomicranges.from_pandas(
pd.DataFrame(
{
"seqnames": ["chr1", "chr2", "chr3"],
"starts": [100, 115, 119],
"ends": [103, 116, 120],
}
)
)
hits = subject.nearest(query)
print(hits)
For more usage examples, check out the documentation.
Note
This project has been set up using PyScaffold 4.1.1. For details and usage information on PyScaffold see https://pyscaffold.org/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for GenomicRanges-0.3.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4be9d571a700a41379c3c53ee4d83a2d26f28ef4f17efc3eda1537091700974e |
|
MD5 | 72e74231ec7f7243a2f8ec07f5c92e7d |
|
BLAKE2b-256 | 194a94ac6c23b39296d0d9e3f6f4954352c10e438e485c7334c66df0aa0f7671 |