Skip to main content

Efficiently access the 'Bedrock Bio' library of open-access computational biology datasets.

Project description

bedrock-bio

Open-Access Computational Biology Datasets

Description

Efficiently access a curated library of open-access computational biology datasets. Tables support predicate pushdown and projection to the cloud storage backend, enabling quick, iterative access to otherwise massive, unwieldy tables.

bedrock_bio consists of five user-facing functions:

  • list_namespaces(): returns a list of available namespace (data source) identifiers
  • describe_namespace('<name>'): returns metadata, citation, license, instructions, and the tables for a namespace
  • list_tables(): returns a list of available table identifiers
  • describe_table('<name>'): returns metadata, citation, partition and sort keys, and column definitions for a table
  • load_table('<name>'): returns a lazy DuckDB relation for a table

DuckDB methods (filter, select, limit) can be used on the relation returned by load_table to push down row filters and column selections to the storage backend. Filtering on the partition columns returned by describe_table gives the fastest reads.

Installation

To install the latest release from PyPI:

pip install bedrock-bio

Or install the current development version from GitHub:

pip install git+https://github.com/bedrock-bio/bedrock-bio.git@main#subdirectory=python

Examples

import bedrock_bio as bb

List available tables:

bb.list_tables()

Describe a table to see its metadata, citation, and columns:

bb.describe_table('ukb_ppp.pqtls')

Lazily load a table, filter on partition columns (for fastest reads), select columns, and collect into an in-memory data frame:

df = (
    bb.load_table('ukb_ppp.pqtls')
      .filter("ancestry = 'EUR' AND protein_id = 'A0FGR8' AND panel = 'Inflammation'")
      .select('chromosome, position, effect_allele, other_allele, beta, neg_log_10_p_value')
      .fetchdf()
)

Dataset Requests

To request the addition of a new table to the library, open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bedrock_bio-1.4.0.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bedrock_bio-1.4.0-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file bedrock_bio-1.4.0.tar.gz.

File metadata

  • Download URL: bedrock_bio-1.4.0.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for bedrock_bio-1.4.0.tar.gz
Algorithm Hash digest
SHA256 cf2c74baee6da654089bf6a2bd9b7df9860407b23ee12824bf2867e1d69bc1cd
MD5 d9a626d614276cde11f063ea0c78a2c6
BLAKE2b-256 27a9c8c15bb68449bb33af950874409311764b1e8ae28a9ef8ab14b8f4287e11

See more details on using hashes here.

File details

Details for the file bedrock_bio-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: bedrock_bio-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for bedrock_bio-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cd15339a102cd10b8f76a74d2b683269b882eb465dfe2dbda1833aea5520dd36
MD5 705d1ca5de718c3f1474193875a51af9
BLAKE2b-256 ad8466ba3d632bd59cc05355412597e43249d521d8d28dccca0e822c90098c3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page