Skip to main content

Common processing functionality for the ChEBI ontology

Project description

python-chebi-utils

Common processing functionality for the ChEBI ontology — download data files, extract classes and relations, extract molecules, and generate stratified train/val/test splits.

Installation

pip install chebi-utils

For development (includes pytest and ruff):

pip install -e ".[dev]"

Features

Download ChEBI data files

from chebi_utils import download_chebi_obo, download_chebi_sdf

obo_path = download_chebi_obo(dest_dir="data/")   # downloads chebi.obo
sdf_path = download_chebi_sdf(dest_dir="data/")   # downloads chebi.sdf.gz

Files are fetched from the EBI FTP server.

Extract ontology classes and relations

from chebi_utils import extract_classes, extract_relations

classes = extract_classes("chebi.obo")
# DataFrame: id, name, definition, is_obsolete

relations = extract_relations("chebi.obo")
# DataFrame: source_id, target_id, relation_type  (is_a, has_role, …)

Extract molecules

from chebi_utils import extract_molecules

molecules = extract_molecules("chebi.sdf.gz")
# DataFrame: chebi_id, name, smiles, inchi, inchikey, formula, charge, mass, …

Both plain .sdf and gzip-compressed .sdf.gz files are supported.

Generate train/val/test splits

from chebi_utils import create_splits

splits = create_splits(molecules, train_ratio=0.8, val_ratio=0.1, test_ratio=0.1)
train_df = splits["train"]
val_df   = splits["val"]
test_df  = splits["test"]

Pass stratify_col to preserve class proportions across splits:

splits = create_splits(classes, stratify_col="is_obsolete", seed=42)

Running Tests

pytest tests/ -v

Linting

ruff check .
ruff format --check .

CI/CD

A GitHub Actions workflow (.github/workflows/ci.yml) automatically runs ruff linting and the full test suite on every push and pull request across Python 3.10, 3.11, and 3.12.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chebi_utils-0.1.0.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chebi_utils-0.1.0-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file chebi_utils-0.1.0.tar.gz.

File metadata

  • Download URL: chebi_utils-0.1.0.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chebi_utils-0.1.0.tar.gz
Algorithm Hash digest
SHA256 af259ff9804048b675eb1f90a5469550819f3ffbb1f0595dbb07eae829650dc5
MD5 5dc079b97f58ae77ee3a99aceaded7ba
BLAKE2b-256 36e2f8ca7b21e9c11367c6e1563795e9026cca7a8a9d1673def6f8555226879f

See more details on using hashes here.

Provenance

The following attestation bundles were made for chebi_utils-0.1.0.tar.gz:

Publisher: python-publish.yml on ChEB-AI/python-chebi-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chebi_utils-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: chebi_utils-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chebi_utils-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 19c9214c10890d5c0ad405fdb616cdbd29c881e64adfa66165a7afeb98a419b9
MD5 d55c7d96c5d4067b4ed0d3ea81c91d3f
BLAKE2b-256 be3fa893f5aa52f78645f1030054d91951bffb708ddddcb4425f9d2651ae7645

See more details on using hashes here.

Provenance

The following attestation bundles were made for chebi_utils-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on ChEB-AI/python-chebi-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page