Tools for processing genomic ranges
Project description
genomictools - Tools for processing genomic ranges
The genomictools package provides an easy solution to all data types with a genomic range.
Installation
pip install genomictools
Basic usage
The basic class of a genomic range is called GenomicPos
. The start and stop in GenomicPos are all 1-based coordinate.
r = GenomicPos("chr1", 1, 100)
print(str(r), r.name, r.start, r.stop) # "chr1:1-100" "chr1", 1, 100
r = GenomicPos("chr1:1-100")
print(str(r), r.name, r.start, r.stop) # "chr1:1-100" "chr1", 1, 100
r = GenomicPos("chr1:1")
print(str(r), r.name, r.start, r.stop) # "chr1:1-1" "chr1", 1, 1
To avoid confusion, it also provides the way to get 0-based or 1-based coordinate zstart
, ostart
, zstop
, ostop
. For example, in BED, we usually have 0-based start coordinate and 1-based stop coordinate.
r = GenomicPos("chr1", 1, 100)
print(r.name, r.zstart, r.ostop) # "chr1" 0 100
To store a list of genomic ranges, we can use GenomicCollection
.
from genomictools import GenomicPos, GenomicCollection
r1 = GenomicPos("chr1:1-100")
r2 = GenomicPos("chr3:1000-2000")
r3 = GenomicPos("chr1:51-200")
regions = GenomicCollection([r1, r2, r3])
print(len(regions)) # 3
# When iterating through the regions, they will be sorted by name, start and stop.
for r in regions:
print(str(r))
# chr1:1-100
# chr1:51-200
# chr3:1000-2000
# One can check if a region overlaps any entry within the genomic collection
print(regions.overlaps(GenomicPos("chr1:201-300"))) # False
print(regions.overlaps(GenomicPos("chr1:2-3"))) # True
print(regions.overlaps(GenomicPos("chr2:2-3"))) # False
# One can extract all entries from the genomic collection that overlap with the target region
for r in regions.find_overlaps(GenomicPos("chr1:1-3")):
print(str(r))
# chr1:1-100
for r in regions.find_overlaps(GenomicPos("chr1:26-75")):
print(str(r))
# chr1:1-100
# chr1:51-200
For any data entry with an associated genomic range, it will implement GenomicAnnotation
, where any GenomicAnnotation
instance will have a property genomic_pos
from biodata.bed import BED
bed = BED("chr1", 0, 100, name="R1")
print(str(bed.genomic_pos)) # chr1:1-100
One can use GenomicAnnotation
as entries in GenomicCollection
.
from biodata.bed import BED
from genomictools import GenomicPos, GenomicCollection
beds = GenomicCollection([BED("chr1", 0, 100, name="R1"), BED("chr3", 1999, 2000, name="R2"), BED("chr1", 50, 200, name="R3")])
for bed in beds:
r = bed.genomic_pos
print(bed.name, str(r))
# R1 chr1:1-100
# R3 chr1:51-200
# R2 chr3:2000-2000
for bed in beds.find_overlaps(GenomicPos("chr1:26-75")):
r = bed.genomic_pos
print(bed.name, str(r))
# R1 chr1:1-100
# R3 chr1:51-200
The base class of genomic range with strand is StrandedGenomicPos
, which extends GenomicPos
.
For any data entry with an associated stranded genomic range, it will implement StrandedGenomicAnnotation
, where any StrandedGenomicAnnotation
instance will have properties stranded_genomic_pos
and genomic_pos
. The strand should be +
for positive strand, -
for negative strand and .
for unspecified strand.
from biodata.bed import BED
bed = BED("chr1", 0, 100, name="R1", strand="-")
print(str(bed.stranded_genomic_pos)) # chr1:1-100:-
print(str(bed.genomic_pos)) # chr1:1-100
One could also use GenomicCollection
to store genomic ranges data easily:
from biodata.bed import BEDReader
from genomictools import GenomicCollection
beds = BEDReader.read_all(GenomicCollection, filename)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file genomictools-0.0.8.tar.gz
.
File metadata
- Download URL: genomictools-0.0.8.tar.gz
- Upload date:
- Size: 144.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42e6214bfd210a6d6340d364d39ccf5e4f379df8c222a379396a80d4174a8ac9 |
|
MD5 | 13419aebe5d8581045b897b4d93bf1ae |
|
BLAKE2b-256 | 5de4fe097d8f16f14f215688c388a2410d1d856f41dabfc271417475d6d7ce91 |
File details
Details for the file genomictools-0.0.8-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_39_x86_64.whl
.
File metadata
- Download URL: genomictools-0.0.8-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_39_x86_64.whl
- Upload date:
- Size: 128.4 kB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64, manylinux: glibc 2.39+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e30546fd4e645536d16c0bf74b7a43fa30f74629e357cf02149840b35c84c90d |
|
MD5 | 754fb186050014c1008b379b9caa66ac |
|
BLAKE2b-256 | 267e19db82dbfae1f05c5a1d0e0ae97265a1f30618a7c4885e52aac92480ae28 |