Tools for processing genomic ranges
Project description
genomictools - Tools for processing genomic ranges
The genomictools package provides an easy solution to all data types with a genomic range.
Installation
pip install genomictools
Basic usage
The basic class of a genomic range is called GenomicPos. The start and stop in GenomicPos are all 1-based coordinate.
r = GenomicPos("chr1", 1, 100)
print(str(r), r.name, r.start, r.stop) # "chr1:1-100" "chr1", 1, 100
r = GenomicPos("chr1:1-100")
print(str(r), r.name, r.start, r.stop) # "chr1:1-100" "chr1", 1, 100
r = GenomicPos("chr1:1")
print(str(r), r.name, r.start, r.stop) # "chr1:1-1" "chr1", 1, 1
To avoid confusion, it also provides the way to get 0-based or 1-based coordinate zstart, ostart, zstop, ostop. For example, in BED, we usually have 0-based start coordinate and 1-based stop coordinate.
r = GenomicPos("chr1", 1, 100)
print(r.name, r.zstart, r.ostop) # "chr1" 0 100
To store a list of genomic ranges, we can use GenomicCollection.
from genomictools import GenomicPos, GenomicCollection
r1 = GenomicPos("chr1:1-100")
r2 = GenomicPos("chr3:1000-2000")
r3 = GenomicPos("chr1:51-200")
regions = GenomicCollection([r1, r2, r3])
print(len(regions)) # 3
# When iterating through the regions, they will be sorted by name, start and stop.
for r in regions:
print(str(r))
# chr1:1-100
# chr1:51-200
# chr3:1000-2000
# One can check if a region overlaps any entry within the genomic collection
print(regions.overlaps(GenomicPos("chr1:201-300"))) # False
print(regions.overlaps(GenomicPos("chr1:2-3"))) # True
print(regions.overlaps(GenomicPos("chr2:2-3"))) # False
# One can extract all entries from the genomic collection that overlap with the target region
for r in regions.find_overlaps(GenomicPos("chr1:1-3")):
print(str(r))
# chr1:1-100
for r in regions.find_overlaps(GenomicPos("chr1:26-75")):
print(str(r))
# chr1:1-100
# chr1:51-200
For any data entry with an associated genomic range, it will implement GenomicAnnotation, where any GenomicAnnotation instance will have a property genomic_pos
from biodata.bed import BED
bed = BED("chr1", 0, 100, name="R1")
print(str(bed.genomic_pos)) # chr1:1-100
One can use GenomicAnnotation as entries in GenomicCollection.
from biodata.bed import BED
from genomictools import GenomicPos, GenomicCollection
beds = GenomicCollection([BED("chr1", 0, 100, name="R1"), BED("chr3", 1999, 2000, name="R2"), BED("chr1", 50, 200, name="R3")])
for bed in beds:
r = bed.genomic_pos
print(bed.name, str(r))
# R1 chr1:1-100
# R3 chr1:51-200
# R2 chr3:2000-2000
for bed in beds.find_overlaps(GenomicPos("chr1:26-75")):
r = bed.genomic_pos
print(bed.name, str(r))
# R1 chr1:1-100
# R3 chr1:51-200
The base class of genomic range with strand is StrandedGenomicPos, which extends GenomicPos.
For any data entry with an associated stranded genomic range, it will implement StrandedGenomicAnnotation, where any StrandedGenomicAnnotation instance will have properties stranded_genomic_pos and genomic_pos. The strand should be + for positive strand, - for negative strand and . for unspecified strand.
from biodata.bed import BED
bed = BED("chr1", 0, 100, name="R1", strand="-")
print(str(bed.stranded_genomic_pos)) # chr1:1-100:-
print(str(bed.genomic_pos)) # chr1:1-100
One could also use GenomicCollection to store genomic ranges data easily:
from biodata.bed import BEDReader
from genomictools import GenomicCollection
beds = BEDReader.read_all(GenomicCollection, filename)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file genomictools-0.0.8.tar.gz.
File metadata
- Download URL: genomictools-0.0.8.tar.gz
- Upload date:
- Size: 144.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42e6214bfd210a6d6340d364d39ccf5e4f379df8c222a379396a80d4174a8ac9
|
|
| MD5 |
13419aebe5d8581045b897b4d93bf1ae
|
|
| BLAKE2b-256 |
5de4fe097d8f16f14f215688c388a2410d1d856f41dabfc271417475d6d7ce91
|
File details
Details for the file genomictools-0.0.8-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_39_x86_64.whl.
File metadata
- Download URL: genomictools-0.0.8-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_39_x86_64.whl
- Upload date:
- Size: 128.4 kB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64, manylinux: glibc 2.39+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e30546fd4e645536d16c0bf74b7a43fa30f74629e357cf02149840b35c84c90d
|
|
| MD5 |
754fb186050014c1008b379b9caa66ac
|
|
| BLAKE2b-256 |
267e19db82dbfae1f05c5a1d0e0ae97265a1f30618a7c4885e52aac92480ae28
|