Skip to main content

Ersilia utilities for working with tabular output data

Project description

Work in Progress

Manipulating Ersilia's dataframes

eosframes is a library for manipulating inputs and outputs from the Ersilia Model Hub. It splits, assembles, converts, scales, and summarises tabular model output files.

Installation

Python ≥ 3.8 is required.

pip install eosframes

Or from source:

git clone https://github.com/ersilia-os/eosframes.git
cd eosframes
pip install -e .

Quick start

Every file the library reads or writes encodes a model ID and version in its filename, e.g. eos4e40_v1.csv (model eos4e40, version v1).

# Slice a big input CSV into chunks for parallel model runs
eosframes split compounds.csv -o chunks/ --chunksize 10000

# Stitch the per-batch outputs back into one file
eosframes append eos4e40_v1_000.csv eos4e40_v1_001.csv -o eos4e40_v1.csv

# Combine outputs from multiple models, side by side
eosframes stack eos4e40_v1.csv eos7m30_v1.csv -o project_eosmix.csv

Everything the CLI does is also importable:

from eosframes import read_csv, hstack, fit, transform

df = read_csv("eos4e40_v1.csv")
params = fit(df)
scaled = transform(df, params)

Run eosframes --help (or eosframes <command> --help) for inline help.

Commands

Command Purpose
split Slice any CSV into chunk files for parallel model runs.
convert CSV ↔ H5, or assemble a chunks folder.
append Vertically concatenate batches from the same model.
dedupe Drop duplicate rows by key.
stack Horizontally combine outputs from different models.
unstack Split a stacked file back into per-model files.
summary Per-feature stats from a local file.
info Model metadata fetched from GitHub.
columns Feature definitions fetched from GitHub.
fit Fit a type-aware robust scaler and save its parameters.
transform Apply a saved scaler to a file.

See docs/cli.md for every flag, example, and refusal condition.

Documentation

  • docs/cli.md — every CLI command, all flags, examples, and error patterns.
  • docs/nomenclature.md — every recognised filename / directory pattern, the strict/lenient contract, and the two stack modes.
  • docs/scaling.md — the type-aware robust scaler: column kinds, how each is picked, and quantization / imputation.

About the Ersilia Open Source Initiative

The Ersilia Open Source Initiative is a tech-nonprofit fueling sustainable research in the Global South. Ersilia's main asset is the Ersilia Model Hub, an open-source repository of AI/ML models for drug discovery.

Ersilia Logo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eosframes-1.1.0.tar.gz (51.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eosframes-1.1.0-py3-none-any.whl (56.3 kB view details)

Uploaded Python 3

File details

Details for the file eosframes-1.1.0.tar.gz.

File metadata

  • Download URL: eosframes-1.1.0.tar.gz
  • Upload date:
  • Size: 51.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.4.1 CPython/3.12.3 Linux/6.17.0-1010-azure

File hashes

Hashes for eosframes-1.1.0.tar.gz
Algorithm Hash digest
SHA256 3c4d371bacdd3d7dc284c8d9c2b1f6d5cedab6b4caef0202db5f9b743621c97d
MD5 29afe8f255059502bb28e74b4ef0e466
BLAKE2b-256 ce110e376629be92861e2263a4c74a61e361a59292399a07d18662442122da99

See more details on using hashes here.

File details

Details for the file eosframes-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: eosframes-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 56.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.4.1 CPython/3.12.3 Linux/6.17.0-1010-azure

File hashes

Hashes for eosframes-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 70aa84bf032071d34d8eba19a96cafe4fefa6684e856771cc2cc616bd255fafe
MD5 eeb03857437d2d9e45a7e1f72fd09035
BLAKE2b-256 888899f82984c18d2af9abd2e37610e0db119c9a4b86cafc90d667e342d703ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page