S(tatistical)tabix
Project description
Stabix
Stabix enables efficient GWAS queries (Genome-Wide Association Study), or other tsv files with genomic positions. It enables users to compress files with multiple codecs, add threshold-based indices for specific columns (such as p-value), and query the data using genomic regions defined in bed files.
Installation
Install Stabix easily via pip:
pip install stabix
Quick Start
Get up and running with Stabix in just a few lines of code:
from stabix import Stabix
# Initialize the index with your GWAS file
idx = Stabix("myIndex", "gwas.tsv")
# Compress the GWAS file
idx.compress(block_size=2000)
# Query the data using a BED file
idx.query("regions.bed", "output.tsv")
This example:
- Creates an index for your GWAS file.
- Compresses the file using the default "bz2" codec.
- Queries the compressed data for variants within the genomic regions specified in your BED file.
- The results are saved to a file,
output.tsv.
- The results are saved to a file,
For more advanced features, like filtering by column values and specifying multiple codecs, see the Usage section below.
Usage
Stabix Index
Stabix(index_dir, gwas_file)
index_dir: Path for the Stabix index directory. This directory is created bycompressionand accessed usingquery.gwas_file: Path or URL to your GWAS file (e.g., a tab-separated.tsvfile).
Methods
compress
Compresses the GWAS file.
- one of
block_sizeormap_fileblock_size: Integer specifying the block size for compression and indexing.map_file: Path to a genetic map file. This allows for a variable block size.
codecs: Optional. Either:- A string (e.g.,
"bz2") to use the same codec for all data types. - A dictionary mapping data types to codecs, e.g.,
{"int": "bz2", "float": "bz2", "string": "bz2"}. - Defaults to
"bz2"if not specified.
- A string (e.g.,
add_threshold_index
Adds a threshold-based index for a specific column.
col_idx: Zero-based index of the column to index (e.g.,8for the 9th column).bins: List of floats defining bin boundaries (e.g.,[0.1]creates bins for< 0.1and≥ 0.1).
query
Queries the compressed data using a BED file.
bed_file: Path to a BED file with genomic regions (at least three columns: chromosome, start, end).out_path: Path for an output tsv file.col_idx: Optional. Zero-based column index for filtering (must be paired withthreshold).threshold: Optional. String specifying a threshold condition (e.g.,"<= 0.1", must be paired withcol_idx).
Note: If filtering by a column value, you must first call add_threshold_index for that column.
Advanced example
Here’s a complete workflow to compress remotely, index, and query a GWAS file with a threshold and map file:
from stabix import Stabix
# We can pull in the gwas file on-demand using curl
gwas_url = "https://.../gwas.tsv"
idx = Stabix("testIndex", gwas_url)
idx.compress(
# We can use different codecs for each datatype.
codecs={
"int": "xz",
"float" "xlib",
"string": "bz2"
},
# And, use variable block sizes with a map file.
map_file="plink.chr2.GRCh36.map"
)
# We can specify a bin boundary (0.1)
# to make queries for low values efficient.
idx.add_threshold_index(8, [0.1])
# This queries, WHERE col_8 < 0.1
idx.query("regions.bed", "output.tsv", 8, "< 0.1")
This:
- Compresses
gwas.tsvafter being downloaded from the URL, with different codecs for each datatype, and variable block sizes. - Indexes column 8 with bins at 0.1 (creating
< 0.1and≥ 0.1). - Queries for variants in
regions.bedregions where column 8 values are< 0.1.- The results are saved to a file
output.tsv.
- The results are saved to a file
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stabix-2.0.0.tar.gz.
File metadata
- Download URL: stabix-2.0.0.tar.gz
- Upload date:
- Size: 178.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9e5e9566538df38133469917d069fb314af7b0f768ba574d77d67847290d3c1
|
|
| MD5 |
7872f9606311bc61c32bc9ab35cc9dae
|
|
| BLAKE2b-256 |
ead5abee640ed77538eb86af9039dc27e711005e31fba5b3e7992b661bd845f4
|
File details
Details for the file stabix-2.0.0-py3-none-any.whl.
File metadata
- Download URL: stabix-2.0.0-py3-none-any.whl
- Upload date:
- Size: 176.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d92ed00c8e5cd17979b91567a1235b7e579bbdec5c89d5fbe863f0872c9ddd04
|
|
| MD5 |
fe74718939c57b6ef35e5426fd8a2466
|
|
| BLAKE2b-256 |
a5c1e039ec6d627ddd24b109c5dd53353e006fb4b47d9399fd62c1cbb66c6ff7
|