Skip to main content

S(tatistical)tabix

Project description

Stabix

Stabix enables efficient indexing and querying of GWAS (Genome-Wide Association Study) data. It enables users to compress bed files, add threshold-based indices for specific columns (such as p-value), and query the data using genomic regions defined in BED files. Stabix also supports column comprssion each with a different codec for fine-tuned indices.

Installation

Install Stabix easily via pip:

pip install stabix

Quick Start

Get up and running with Stabix in just a few lines of code:

from stabix import Stabix

# Initialize the index with your GWAS file
idx = Stabix("path/to/your/gwas_file.tsv", block_size=2000, name="my_index")

# Compress the GWAS file
idx.compress()

# Query the data using a BED file
idx.query("path/to/your/regions.bed")

This example:

  1. Creates an index for your GWAS file.
  2. Compresses the file using the default "bz2" codec.
  3. Queries the compressed data for variants within the genomic regions specified in your BED file.

The results are saved to a file (e.g., based on the name parameter: such as my_index.query).

For more advanced features, like filtering by column values, see the Usage section below.

Usage

Stabix Index

Stabix(gwas_file, block_size, name)
  • gwas_file: Path to your GWAS file (e.g., a tab-separated .tsv file).
  • block_size: Integer specifying the block size for compression and indexing.
  • name: String to name the index, used for output files.

Methods

compress(codecs=None)

Compresses the GWAS file.

  • codecs: Optional. Either:
    • A string (e.g., "bz2") to use the same codec for all data types.
    • A dictionary mapping data types to codecs, e.g., {"int": "bz2", "float": "bz2", "string": "bz2"}.
    • Defaults to "bz2" if not specified.

add_threshold_index(col_idx, bins)

Adds a threshold-based index for a specific column.

  • col_idx: Zero-based index of the column to index (e.g., 8 for the 9th column).
  • bins: List of floats defining bin boundaries (e.g., [0.1] creates bins for < 0.1 and ≥ 0.1).

query(bed_file, col_idx=None, threshold=None)

Queries the compressed data using a BED file.

  • bed_file: Path to a BED file with genomic regions (at least three columns: chromosome, start, end).
  • col_idx: Optional. Zero-based column index for filtering (must be paired with threshold).
  • threshold: Optional. String specifying a threshold condition (e.g., "<= 0.1", must be paired with col_idx).

Note: If filtering by a column value, you must first call add_threshold_index for that column.

Example

Here’s a complete workflow to compress, index, and query a GWAS file with a threshold:

from stabix import Stabix

# Initialize the index
idx = Stabix("test.tsv", block_size=2000, name="exp1")

# Compress the file
idx.compress("bz2")

# Add a threshold index for column 8 (e.g., p-values)
idx.add_threshold_index(8, [0.1])

# Query with a BED file, filtering for p-values < 0.1
idx.query("test.bed", 8, "< 0.1")

This:

  1. Compresses test.tsv.
  2. Indexes column 8 with bins at 0.1 (creating < 0.1 and ≥ 0.1).
  3. Queries for variants in test.bed regions where column 8 values are < 0.1.
  • The results are saved to a file exp1.query).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stabix-1.0.0.tar.gz (181.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stabix-1.0.0-py3-none-any.whl (179.3 kB view details)

Uploaded Python 3

File details

Details for the file stabix-1.0.0.tar.gz.

File metadata

  • Download URL: stabix-1.0.0.tar.gz
  • Upload date:
  • Size: 181.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for stabix-1.0.0.tar.gz
Algorithm Hash digest
SHA256 daddebe64655cf281fa5498848c6ff6103ca8399561c4f945508cfaac96050e9
MD5 b89a81fbf0c7d9ae730ba5231ad874a5
BLAKE2b-256 5c4848a5ef9d9a556705910448bcd2a698fcdff9616d843fbd3a51f0621d1904

See more details on using hashes here.

File details

Details for the file stabix-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: stabix-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 179.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for stabix-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 39c52b000ec3d439f731a5bcbde8b16c8a47907da5e7d45964db894d096b4e21
MD5 9d87dd5673aac9c11677cfc9a514aa5c
BLAKE2b-256 d8aee8995cfc056963c557dddc50522bff682a38ec5736f533d430c0290ad581

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page