Skip to main content

Open-Access Computational Biology Datasets

Project description

bedrockbio

Open-Access Computational Biology Datasets

Description

Efficiently access a curated library of open-access computational biology datasets. Datasets support predicate pushdown and projection to the cloud storage backend, enabling quick, iterative access to otherwise massive, unwieldy datasets.

bedrock_bio consists of two user-facing functions:

  • list_datasets(): returns a list of available datasets
  • load_dataset('<name>'): takes a dataset name and returns a lazily-evaluated data frame.

polars verbs (filter, select) can be used on the data frame returned by load_dataset to push down row filters and column selections to the storage backend. This means that only a subset of rows and columns need to be actually downloaded and read into memory.

Installation

To install the latest release from PyPI:

pip install bedrock-bio

To install the current development version from GitHub:

pip install git+https://github.com/bedrock-bio/bedrock-bio-client.git@main#subdirectory=python

Examples

Load the package (and polars for downstream data frame manipulation):

import bedrock_bio as bb
import polars as pl

List available datasets:

bb.list_datasets()

Lazily load a dataset:

lf = bb.load_dataset('ukb_ppp/pqtls')

Inspect the contents of a dataset before downloading and collecting into memory:

print(lf.collect_schema())

Filter rows, select columns, and collect the relevant subset into an in-memory data frame:

df = lf \
  .filter(
    pl.col('ancestry') == 'EUR', 
    pl.col('protein') == 'A0FGR8'
  ) \
  .select(
    'chromosome', 
    'position', 
    'effect_allele', 
    'other_allele', 
    'beta', 
    'neg_log_10_p_value'
  ) \
  .collect()

Dataset Requests

To request the addition of a new dataset to the library, open an issue on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bedrock_bio-1.1.0.tar.gz (2.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bedrock_bio-1.1.0-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file bedrock_bio-1.1.0.tar.gz.

File metadata

  • Download URL: bedrock_bio-1.1.0.tar.gz
  • Upload date:
  • Size: 2.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for bedrock_bio-1.1.0.tar.gz
Algorithm Hash digest
SHA256 0b78c5d42f6d708d220c2d6acaaa04e4a96657c76baf3021b2fb967f92e5837c
MD5 8c096c5ac3e4a7f52064ff0dce54874d
BLAKE2b-256 845146a251c013ad41d129b747d1967b8fba631dee56101d3365ea703e405318

See more details on using hashes here.

File details

Details for the file bedrock_bio-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: bedrock_bio-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for bedrock_bio-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 05b4fbdd85d7c4627d0c2d30d5b293a83358db5375a21acf154e5ea7d7c71402
MD5 eb35c1bad781ec95feec5dc6c9b67c52
BLAKE2b-256 31c0203b009a9ff42278adeedbe72ced7ec7cb87d0fc5e90d911d7f8986dc148

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page