Container class to represent and operate over genomic regions and annotations.

These details have been verified by PyPI

Project links

Owner

BiocPy

GitHub Statistics

These details have not been verified by PyPI

Project description

Unit tests

GenomicRanges

GenomicRanges provides container classes designed to represent genomic locations and support genomic analysis. It is similar to Bioconductor's GenomicRanges.

To get started, install the package from PyPI

pip install genomicranges

Some of the methods like read_ucsc require optional packages to be installed, e.g. joblib and can be installed by:

pip install genomicranges[optional]

`GenomicRanges`

GenomicRanges is the base class to represent and operate over genomic regions and annotations.

From Bioinformatic file formats

[!NOTE] When reading genomic formats, ends are expected to be inclusive to be consistent with Bioconductor representations (& gff). If they are not, we recommend subtracting 1 from the ends.

From `biobear`

Although the parsing capabilities in this package are limited, the biobear library is designed for reading and searching various bioinformatics file formats, including FASTA, FASTQ, VCF, BAM, and GFF, or from an object store like S3. Users can esily convert these representations to GenomicRanges (or read more here):

from genomicranges import GenomicRanges
import biobear as bb

session = bb.new_session()

df = session.read_gtf_file("path/to/test.gtf").to_polars()
df = df.rename({"seqname": "seqnames", "start": "starts", "end": "ends"})

gg = GenomicRanges.from_polars(df)

# do stuff w/ a genomic ranges
print(len(gg), len(df))

## output
## 77 77> [!NOTE]

ends are expected to be inclusive to be consistent with Bioconductor representations. If they are not, we recommend subtracting 1 from the ends.

UCSC or GTF file

You can easily download and parse genome annotations from UCSC or load a genome annotation from a GTF file,

import genomicranges

gr = genomicranges.read_gtf(<PATH TO GTF>)
# OR
gr = genomicranges.read_ucsc(genome="hg19")

print(gr)

## output
## GenomicRanges with 1760959 intervals & 10 metadata columns.
## ... truncating the console print ...

From `IRanges` (Preferred way)

If you have all relevant information to create a GenomicRanges object

from genomicranges import GenomicRanges
from iranges import IRanges
from biocframe import BiocFrame
from random import random

gr = GenomicRanges(
    seqnames=[
        "chr1",
        "chr2",
        "chr3",
        "chr2",
        "chr3",
    ],
    ranges=IRanges(start=[x for x in range(101, 106)], width=[11, 21, 25, 30, 5]),
    strand=["*", "-", "*", "+", "-"],
    mcols=BiocFrame(
        {
            "score": range(0, 5),
            "GC": [random() for _ in range(5)],
        }
    ),
)

print(gr)

## output
GenomicRanges with 5 ranges and 5 metadata columns
    seqnames    ranges           strand     score                  GC
       <str> <IRanges> <ndarray[int64]>   <range>              <list>
[0]     chr1 101 - 111                * |       0  0.2593301003406461
[1]     chr2 102 - 122                - |       1  0.7207993213776644
[2]     chr3 103 - 127                * |       2 0.23391468067222065
[3]     chr2 104 - 133                + |       3  0.7671026589720187
[4]     chr3 105 - 109                - |       4 0.03355777784472458
------
seqinfo(3 sequences): chr1 chr2 chr3

Pandas `DataFrame`

A common representation in Python is a pandas DataFrame for all tabular datasets. DataFrame must contain columns "seqnames", "starts", and "ends" to represent genomic intervals. Here's an example:

from genomicranges import GenomicRanges
import pandas as pd
from random import random

df = pd.DataFrame(
    {
        "seqnames": ["chr1", "chr2", "chr1", "chr3", "chr2"],
        "starts": [101, 102, 103, 104, 109],
        "ends": [112, 103, 128, 134, 111],
        "strand": ["*", "-", "*", "+", "-"],
        "score": range(0, 5),
        "GC": [random() for _ in range(5)],
    }
)

gr = GenomicRanges.from_pandas(df)
print(gr)

## output
GenomicRanges with 5 ranges and 5 metadata columns
  seqnames    ranges           strand    score                  GC
     <str> <IRanges> <ndarray[int64]>   <list>              <list>
0     chr1 101 - 111                * |      0  0.4862658925128007
1     chr2 102 - 102                - |      1 0.27948386889389953
2     chr1 103 - 127                * |      2  0.5162697718607901
3     chr3 104 - 133                + |      3  0.5979843806415466
4     chr2 109 - 110                - |      4 0.04740781186083798
------
seqinfo(3 sequences): chr1 chr2 chr3

Polars `DataFrame`

Similarly, To initialize from a polars DataFrame:

from genomicranges import GenomicRanges
import polars as pl
from random import random

df = pl.DataFrame(
    {
        "seqnames": ["chr1", "chr2", "chr1", "chr3", "chr2"],
        "starts": [101, 102, 103, 104, 109],
        "ends": [112, 103, 128, 134, 111],
        "strand": ["*", "-", "*", "+", "-"],
        "score": range(0, 5),
        "GC": [random() for _ in range(5)],
    }
)

gr = GenomicRanges.from_polars(df)
print(gr)

## output
GenomicRanges with 5 ranges and 5 metadata columns
  seqnames    ranges           strand    score                  GC
     <str> <IRanges> <ndarray[int64]>   <list>              <list>
0     chr1 101 - 112                * |      0  0.4862658925128007
1     chr2 102 - 103                - |      1 0.27948386889389953
2     chr1 103 - 128                * |      2  0.5162697718607901
3     chr3 104 - 134                + |      3  0.5979843806415466
4     chr2 109 - 111                - |      4 0.04740781186083798
------
seqinfo(3 sequences): chr1 chr2 chr3

Interval Operations

GenomicRanges supports most interval based operations.

subject = genomicranges.read_ucsc(genome="hg38")

query = genomicranges.from_pandas(
    pd.DataFrame(
        {
            "seqnames": ["chr1", "chr2", "chr3"],
            "starts": [100, 115, 119],
            "ends": [103, 116, 120],
        }
    )
)

hits = subject.nearest(query, ignore_strand=True, select="all")
print(hits)

## output
BiocFrame with 3 rows and 2 columns
        query_hits        self_hits
    <ndarray[int32]> <ndarray[int32]>
[0]                0                0
[1]                1          1677082
[2]                2          1003411

`CompressedGenomicRangesList`

Just as it sounds, a CompressedGenomicRangesList is a named-list like object. If you are wondering why you need this class, a GenomicRanges object lets us specify multiple genomic elements, usually where the genes start and end. Genes are themselves made of many sub-regions, e.g. exons. CompressedGenomicRangesList allows us to represent this nested structure.

Currently, this class is limited in functionality.

To construct a CompressedGenomicRangesList

from genomicranges import GenomicRanges, CompressedGenomicRangesList
from iranges import IRanges
from biocframe import BiocFrame

gr1 = GenomicRanges(
    seqnames=["chr1", "chr2", "chr1", "chr3"],
    ranges=IRanges([1, 3, 2, 4], [10, 30, 50, 60]),
    strand=["-", "+", "*", "+"],
    mcols=BiocFrame({"score": [1, 2, 3, 4]}),
)

gr2 = GenomicRanges(
    seqnames=["chr2", "chr4", "chr5"],
    ranges=IRanges([3, 6, 4], [30, 50, 60]),
    strand=["-", "+", "*"],
    mcols=BiocFrame({"score": [2, 3, 4]}),
)
grl = CompressedGenomicRangesList.from_list(lst=[gr1, gr2], names=["gene1", "gene2"])
print(grl)

## output
CompressedGenomicRangesList with 2 ranges and 2 metadata columns

Name: gene1
GenomicRanges with 4 ranges and 4 metadata columns
    seqnames    ranges           strand    score
       <str> <IRanges> <ndarray[int64]>   <list>
[0]     chr1    1 - 10                - |      1
[1]     chr2    3 - 32                + |      2
[2]     chr1    2 - 51                * |      3
[3]     chr3    4 - 63                + |      4
------
seqinfo(3 sequences): chr1 chr2 chr3

Name: gene2
GenomicRanges with 3 ranges and 3 metadata columns
    seqnames    ranges           strand    score
       <str> <IRanges> <ndarray[int64]>   <list>
[0]     chr2    3 - 32                - |      2
[1]     chr4    6 - 55                + |      3
[2]     chr5    4 - 63                * |      4
------
seqinfo(3 sequences): chr2 chr4 chr5

Performance

Performance comparison between Python and R GenomicRanges implementations. The query dataset contains approximately 564,000 intervals, while the subject dataset contains approximately 71 million intervals.

Operation	Python/GenomicRanges	Python/GenomicRanges (5 threads)	R/GenomicRanges
Overlap	2.80s	2.06s	4.40s
Overlap (single chromosome)	6.73s	5.19s	10.06s
Nearest	2.27s	1.5s	42.16s
Nearest (single chromosome)	4.7s	4.67s	11.01s

[!NOTE] The single chromosome benchmark ignores chromosome/sequence information and performs overlap operations solely on intervals.

For details, see the scripts in the benchmark directory.

Further information

Note

This project has been set up using PyScaffold 4.1.1. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details

These details have been verified by PyPI

Project links

Owner

BiocPy

GitHub Statistics

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.8.4

Jan 17, 2026

0.8.3

Jan 4, 2026

0.8.2

Jan 3, 2026

0.8.1 yanked

Dec 31, 2025

0.8.0 yanked

Dec 31, 2025

0.7.3

Sep 17, 2025

0.7.2

Aug 22, 2025

0.7.1

Jul 3, 2025

0.7.0

Jun 21, 2025

0.6.3

Mar 22, 2025

0.6.2

Feb 20, 2025

0.6.1

Jan 28, 2025

0.6.0

Jan 12, 2025

0.5.2

Jan 8, 2025

0.5.1

Jan 6, 2025

0.5.0

Dec 20, 2024

0.4.34

Oct 23, 2024

0.4.33

Oct 15, 2024

0.4.32

Oct 15, 2024

0.4.31

Oct 1, 2024

0.4.30

Sep 23, 2024

0.4.29

Aug 14, 2024

0.4.28

Jul 16, 2024

0.4.27

Jul 15, 2024

0.4.26

Jul 14, 2024

0.4.25

Jul 12, 2024

0.4.24

Jul 3, 2024

0.4.23

Jun 28, 2024

0.4.22

Jun 24, 2024

0.4.21

Jun 21, 2024

0.4.20

Jun 19, 2024

0.4.19

Jun 19, 2024

0.4.18

Jun 11, 2024

0.4.17

Jun 11, 2024

0.4.16

Jun 10, 2024

0.4.15

Apr 18, 2024

0.4.14

Apr 12, 2024

0.4.13

Apr 10, 2024

0.4.12

Jan 24, 2024

0.4.11

Jan 24, 2024

0.4.10

Jan 23, 2024

0.4.9

Jan 22, 2024

0.4.8

Jan 5, 2024

0.4.7

Jan 4, 2024

0.4.6

Jan 3, 2024

0.4.5

Dec 30, 2023

0.4.4

Dec 22, 2023

0.4.3

Dec 22, 2023

0.4.2

Dec 12, 2023

0.4.1

Dec 1, 2023

0.4.0

Nov 30, 2023

0.3.9

Nov 14, 2023

0.3.8

Oct 17, 2023

0.3.7

Oct 17, 2023

0.3.6

Sep 21, 2023

0.3.5

Sep 20, 2023

0.3.4

Sep 20, 2023

0.3.3

Sep 13, 2023

0.3.2

Sep 1, 2023

0.3.1

Aug 30, 2023

0.3.0

Aug 21, 2023

0.2.11

May 20, 2023

0.2.10

Mar 24, 2023

0.2.9

Mar 23, 2023

0.2.8

Mar 21, 2023

0.2.7

Dec 31, 2022

0.2.6

Dec 27, 2022

0.2

Dec 13, 2022

0.1.1

Jun 15, 2022

0.1

Jun 14, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genomicranges-0.8.4.tar.gz (77.4 kB view details)

Uploaded Jan 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

genomicranges-0.8.4-py3-none-any.whl (39.1 kB view details)

Uploaded Jan 17, 2026 Python 3

File details

Details for the file genomicranges-0.8.4.tar.gz.

File metadata

Download URL: genomicranges-0.8.4.tar.gz
Upload date: Jan 17, 2026
Size: 77.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for genomicranges-0.8.4.tar.gz
Algorithm	Hash digest
SHA256	`0143ef28ff4dc801a93becc9ec9fc63a4327e98ef30419245662b9067b244578`
MD5	`c535a79d72f02f25e10438207c513d2b`
BLAKE2b-256	`123e5575d92439b27072e6d5e9443c9e3efa0ed34678cb39a4ccda36c87e4542`

See more details on using hashes here.

Provenance

The following attestation bundles were made for genomicranges-0.8.4.tar.gz:

Publisher: publish-pypi.yml on BiocPy/GenomicRanges

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: genomicranges-0.8.4.tar.gz
- Subject digest: 0143ef28ff4dc801a93becc9ec9fc63a4327e98ef30419245662b9067b244578
- Sigstore transparency entry: 832631086
- Sigstore integration time: Jan 17, 2026
Source repository:
- Permalink: BiocPy/GenomicRanges@db826b21cd885929a5de998d9883fdf3ac9a3f41
- Branch / Tag: refs/tags/0.8.4
- Owner: https://github.com/BiocPy
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@db826b21cd885929a5de998d9883fdf3ac9a3f41
- Trigger Event: push

File details

Details for the file genomicranges-0.8.4-py3-none-any.whl.

File metadata

Download URL: genomicranges-0.8.4-py3-none-any.whl
Upload date: Jan 17, 2026
Size: 39.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for genomicranges-0.8.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4136c237ac01a653c9ed48797f3c1d70c9b84ac33fe409d4bff9226fe5ddbf64`
MD5	`73760dc019fc5fcad8e8050b4c0aeead`
BLAKE2b-256	`afc8bb39278bec00559bef2b5611c3f25c368005110f0513c92f6c9634a95364`

See more details on using hashes here.

Provenance

The following attestation bundles were made for genomicranges-0.8.4-py3-none-any.whl:

Publisher: publish-pypi.yml on BiocPy/GenomicRanges

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: genomicranges-0.8.4-py3-none-any.whl
- Subject digest: 4136c237ac01a653c9ed48797f3c1d70c9b84ac33fe409d4bff9226fe5ddbf64
- Sigstore transparency entry: 832631089
- Sigstore integration time: Jan 17, 2026
Source repository:
- Permalink: BiocPy/GenomicRanges@db826b21cd885929a5de998d9883fdf3ac9a3f41
- Branch / Tag: refs/tags/0.8.4
- Owner: https://github.com/BiocPy
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@db826b21cd885929a5de998d9883fdf3ac9a3f41
- Trigger Event: push

GenomicRanges 0.8.4

Navigation

Verified details

Project links

Owner

GitHub Statistics

Unverified details

Meta

Classifiers

Project description

GenomicRanges

GenomicRanges

From Bioinformatic file formats

From biobear

UCSC or GTF file

From IRanges (Preferred way)

Pandas DataFrame

Polars DataFrame

Interval Operations

CompressedGenomicRangesList

Performance

Further information

Note

Project details

Verified details

Project links

Owner

GitHub Statistics

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`GenomicRanges`

From `biobear`

From `IRanges` (Preferred way)

Pandas `DataFrame`

Polars `DataFrame`

`CompressedGenomicRangesList`