Skip to main content

Tools for processing genomic ranges

Project description

genomictools - Tools for processing genomic ranges

The genomictools package provides an easy solution to all data types with a genomic range.

Installation

pip install genomictools

Basic usage

The basic class of a genomic range is called GenomicPos. The start and stop in GenomicPos are all 1-based coordinate.

r = GenomicPos("chr1", 1, 100)
print(str(r), r.name, r.start, r.stop) # "chr1:1-100" "chr1", 1, 100
r = GenomicPos("chr1:1-100")
print(str(r), r.name, r.start, r.stop) # "chr1:1-100" "chr1", 1, 100
r = GenomicPos("chr1:1")
print(str(r), r.name, r.start, r.stop) # "chr1:1-1" "chr1", 1, 1

To avoid confusion, it also provides the way to get 0-based or 1-based coordinate zstart, ostart, zstop, ostop. For example, in BED, we usually have 0-based start coordinate and 1-based stop coordinate.

r = GenomicPos("chr1", 1, 100)
print(r.name, r.zstart, r.ostop) # "chr1" 0 100

To store a list of genomic ranges, we can use GenomicCollection.

from genomictools import GenomicPos, GenomicCollection

r1 = GenomicPos("chr1:1-100")
r2 = GenomicPos("chr3:1000-2000")
r3 = GenomicPos("chr1:51-200")

regions = GenomicCollection([r1, r2, r3])
print(len(regions)) # 3

# When iterating through the regions, they will be sorted by name, start and stop. 
for r in regions: 
	print(str(r))
# chr1:1-100
# chr1:51-200
# chr3:1000-2000

# One can check if a region overlaps any entry within the genomic collection
print(regions.overlaps(GenomicPos("chr1:201-300"))) # False
print(regions.overlaps(GenomicPos("chr1:2-3"))) # True
print(regions.overlaps(GenomicPos("chr2:2-3"))) # False

# One can extract all entries from the genomic collection that overlap with the target region
for r in regions.find_overlaps(GenomicPos("chr1:1-3")):
	print(str(r))
# chr1:1-100
for r in regions.find_overlaps(GenomicPos("chr1:26-75")):
	print(str(r))
# chr1:1-100
# chr1:51-200

For any data entry with an associated genomic range, it will implement GenomicAnnotation, where any GenomicAnnotation instance will have a property genomic_pos

from biodata.bed import BED
bed = BED("chr1", 0, 100, name="R1")
print(str(bed.genomic_pos)) # chr1:1-100

One can use GenomicAnnotation as entries in GenomicCollection.

from biodata.bed import BED
from genomictools import GenomicPos, GenomicCollection

beds = GenomicCollection([BED("chr1", 0, 100, name="R1"), BED("chr3", 1999, 2000, name="R2"), BED("chr1", 50, 200, name="R3")])
for bed in beds:
	r = bed.genomic_pos
	print(bed.name, str(r))
# R1 chr1:1-100
# R3 chr1:51-200
# R2 chr3:2000-2000

for bed in beds.find_overlaps(GenomicPos("chr1:26-75")):
	r = bed.genomic_pos
	print(bed.name, str(r))
# R1 chr1:1-100
# R3 chr1:51-200

The base class of genomic range with strand is StrandedGenomicPos, which extends GenomicPos.

For any data entry with an associated stranded genomic range, it will implement StrandedGenomicAnnotation, where any StrandedGenomicAnnotation instance will have properties stranded_genomic_pos and genomic_pos. The strand should be + for positive strand, - for negative strand and . for unspecified strand.

from biodata.bed import BED
bed = BED("chr1", 0, 100, name="R1", strand="-")
print(str(bed.stranded_genomic_pos)) # chr1:1-100:-
print(str(bed.genomic_pos)) # chr1:1-100

One could also use GenomicCollection to store genomic ranges data easily:

from biodata.bed import BEDReader
from genomictools import GenomicCollection
beds = BEDReader.read_all(GenomicCollection, filename)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genomictools-0.0.8.tar.gz (144.4 kB view details)

Uploaded Source

Built Distribution

genomictools-0.0.8-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_39_x86_64.whl (128.4 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.34+ x86-64 manylinux: glibc 2.39+ x86-64

File details

Details for the file genomictools-0.0.8.tar.gz.

File metadata

  • Download URL: genomictools-0.0.8.tar.gz
  • Upload date:
  • Size: 144.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for genomictools-0.0.8.tar.gz
Algorithm Hash digest
SHA256 42e6214bfd210a6d6340d364d39ccf5e4f379df8c222a379396a80d4174a8ac9
MD5 13419aebe5d8581045b897b4d93bf1ae
BLAKE2b-256 5de4fe097d8f16f14f215688c388a2410d1d856f41dabfc271417475d6d7ce91

See more details on using hashes here.

File details

Details for the file genomictools-0.0.8-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for genomictools-0.0.8-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 e30546fd4e645536d16c0bf74b7a43fa30f74629e357cf02149840b35c84c90d
MD5 754fb186050014c1008b379b9caa66ac
BLAKE2b-256 267e19db82dbfae1f05c5a1d0e0ae97265a1f30618a7c4885e52aac92480ae28

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page