S(tatistical)tabix
Project description
Stabix
Stabix enables efficient indexing and querying of GWAS (Genome-Wide Association Study) data. It enables users to compress bed files, add threshold-based indices for specific columns (such as p-value), and query the data using genomic regions defined in BED files. Stabix also supports column comprssion each with a different codec for fine-tuned indices.
Installation
Install Stabix easily via pip:
pip install stabix
Quick Start
Get up and running with Stabix in just a few lines of code:
from stabix import Stabix
# Initialize the index with your GWAS file
idx = Stabix("path/to/your/gwas_file.tsv", block_size=2000, name="my_index")
# Compress the GWAS file
idx.compress()
# Query the data using a BED file
idx.query("path/to/your/regions.bed")
This example:
- Creates an index for your GWAS file.
- Compresses the file using the default "bz2" codec.
- Queries the compressed data for variants within the genomic regions specified in your BED file.
The results are saved to a file (e.g., based on the name parameter: such as my_index.query).
For more advanced features, like filtering by column values, see the Usage section below.
Usage
Stabix Index
Stabix(gwas_file, block_size, name)
gwas_file: Path to your GWAS file (e.g., a tab-separated.tsvfile).block_size: Integer specifying the block size for compression and indexing.name: String to name the index, used for output files.
Methods
compress(codecs=None)
Compresses the GWAS file.
codecs: Optional. Either:- A string (e.g.,
"bz2") to use the same codec for all data types. - A dictionary mapping data types to codecs, e.g.,
{"int": "bz2", "float": "bz2", "string": "bz2"}. - Defaults to
"bz2"if not specified.
- A string (e.g.,
add_threshold_index(col_idx, bins)
Adds a threshold-based index for a specific column.
col_idx: Zero-based index of the column to index (e.g.,8for the 9th column).bins: List of floats defining bin boundaries (e.g.,[0.1]creates bins for< 0.1and≥ 0.1).
query(bed_file, col_idx=None, threshold=None)
Queries the compressed data using a BED file.
bed_file: Path to a BED file with genomic regions (at least three columns: chromosome, start, end).col_idx: Optional. Zero-based column index for filtering (must be paired withthreshold).threshold: Optional. String specifying a threshold condition (e.g.,"<= 0.1", must be paired withcol_idx).
Note: If filtering by a column value, you must first call add_threshold_index for that column.
Example
Here’s a complete workflow to compress, index, and query a GWAS file with a threshold:
from stabix import Stabix
# Initialize the index
idx = Stabix("test.tsv", block_size=2000, name="exp1")
# Compress the file
idx.compress("bz2")
# Add a threshold index for column 8 (e.g., p-values)
idx.add_threshold_index(8, [0.1])
# Query with a BED file, filtering for p-values < 0.1
idx.query("test.bed", 8, "< 0.1")
This:
- Compresses
test.tsv. - Indexes column 8 with bins at 0.1 (creating
< 0.1and≥ 0.1). - Queries for variants in
test.bedregions where column 8 values are< 0.1.
- The results are saved to a file
exp1.query).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stabix-1.0.0.tar.gz.
File metadata
- Download URL: stabix-1.0.0.tar.gz
- Upload date:
- Size: 181.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
daddebe64655cf281fa5498848c6ff6103ca8399561c4f945508cfaac96050e9
|
|
| MD5 |
b89a81fbf0c7d9ae730ba5231ad874a5
|
|
| BLAKE2b-256 |
5c4848a5ef9d9a556705910448bcd2a698fcdff9616d843fbd3a51f0621d1904
|
File details
Details for the file stabix-1.0.0-py3-none-any.whl.
File metadata
- Download URL: stabix-1.0.0-py3-none-any.whl
- Upload date:
- Size: 179.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39c52b000ec3d439f731a5bcbde8b16c8a47907da5e7d45964db894d096b4e21
|
|
| MD5 |
9d87dd5673aac9c11677cfc9a514aa5c
|
|
| BLAKE2b-256 |
d8aee8995cfc056963c557dddc50522bff682a38ec5736f533d430c0290ad581
|