Python interface for tabix
Project description
This module allows fast random access to files compressed with bgzip and indexed by tabix. It includes a C extension with code from klib. The bgzip and tabix programs are available here.
Installation
pip install --user pytabix
Synopsis
Genomics data is often in a table where each row corresponds to a genomic region (start, end) or a position:
chrom pos snp 1 1000760 rs75316104 1 1000894 rs114006445 1 1000910 rs79750022 1 1001177 rs4970401 1 1001256 rs78650406
With tabix, you can quickly retrieve all rows in a genomic region by specifying a query with a sequence name, start, and end:
import tabix
# Open a remote or local file.
url = "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/"
url += "ALL.2of4intersection.20100804.genotypes.vcf.gz"
tb = tabix.open(url)
# These queries are identical. A query returns an iterator over the results.
records = tb.query("1", 1000000, 1250000)
records = tb.queryi(0, 1000000, 1250000)
records = tb.querys("1:1000000-1250000")
# Each record is a list of strings.
for record in records:
print record[:3]
['1', '1000760', 'rs75316104']
['1', '1000760', 'rs75316104']
['1', '1000894', 'rs114006445']
['1', '1000910', 'rs79750022']
['1', '1001177', 'rs4970401']
['1', '1001256', 'rs78650406']
Example
Let’s say you have a table of gene coordinates:
$ zcat example.bed.gz | shuf | head -n5 | column -t
chr19 53611131 53636172 55786 ZNF415
chr10 72149121 72150375 221017 CEP57L1P1
chr4 185009858 185139113 133121 ENPP6
chrX 132669772 133119672 2719 GPC3
chr6 134924279 134925376 114182 FAM8A6P
Sort it by chromosome, then by start and end positions. Then, use bgzip to deflate the file into compressed blocks:
$ zcat example.bed.gz | sort -k1V -k2n -k3n | bgzip > example.bed.bgz
The compressed size is usually slightly larger than that obtained with gzip.
Index the file with tabix:
$ tabix -s 1 -b 2 -e 3 example.bed.gz
$ ls
example.bed.gz example.bed.bgz example.bed.bgz.tbi
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pytabix-0.1.tar.gz
.
File metadata
- Download URL: pytabix-0.1.tar.gz
- Upload date:
- Size: 45.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0774f1687ebd41811fb07a0e50951b6be72d7cc7e22ed2b18972eaf7482eb7d1 |
|
MD5 | bf9c069c3787c0c240255b917ef34405 |
|
BLAKE2b-256 | 846a520ecf75c2ada77492cb4ed21fb22aed178e791df434ca083b59fffadddd |