Cloud-agnostic machine type scoring engine for computational workloads
Project description
cloudfit-core
Cloud-agnostic machine type scoring engine for computational workloads.
cloudfit-core is the foundation of the cloudfit ecosystem — a pure Python library that, given a workload profile, scores and ranks available cloud instances across providers. No cloud credentials required. No API calls. Just a workload spec in, ranked recommendations out.
The problem
Teams hardcode instance types (c2-standard-60, c7i.16xlarge) in infrastructure-as-code. When providers deprecate them or release better generations, nothing updates — costs drift and performance degrades silently. There is no open-source tool that takes a workload description and returns the best available instance across AWS, GCP, and Azure with explainable scoring.
cloudfit-core is that scoring engine.
Installation
pip install cloudfit-core
Requires Python 3.9+.
Quick start
from cloudfit import WorkloadProfile, MachineType, rank
# Define your workload
profile = WorkloadProfile(
vcpu=60,
ram_gb=224,
workload="io-intensive",
archetype="io", # io | cpu | mem | gpu | burst
optimize_for="balanced", # cost | performance | availability | balanced
)
# Provide candidate instances (from a cloudfit-provider-* package or your own list)
candidates = [
MachineType(id="c2-standard-60", provider="gcp", vcpu=60, ram_gb=240, price_hr=3.13),
MachineType(id="c3d-standard-60-lssd", provider="gcp", vcpu=60, ram_gb=240, price_hr=3.39),
MachineType(id="t2d-standard-60", provider="gcp", vcpu=60, ram_gb=240, price_hr=2.31),
MachineType(id="c7i.24xlarge", provider="aws", vcpu=96, ram_gb=192, price_hr=4.28),
]
# Score and rank
results = rank(profile, candidates)
for r in results:
print(f"{r.instance.id:30s} score={r.score:.2f} ${r.instance.price_hr:.2f}/hr")
Output:
t2d-standard-60 score=0.81 $2.31/hr
c2-standard-60 score=0.81 $3.13/hr
c3d-standard-60-lssd score=0.80 $3.39/hr
c7i.24xlarge score=0.00 $4.28/hr
c7i.24xlarge scores 0.00 and ranks last because its 192 GB RAM is below the
requested 224 GB — it's eliminated by the hard floor filter, not just ranked low
(see How scoring works).
How scoring works
Every recommendation runs through the same weighted scoring function:
score = w_cost × cost_score + w_perf × perf_score + w_avail × avail_score
The optimize_for mode sets the weights:
| Mode | w_cost | w_perf | w_avail | Best for |
|---|---|---|---|---|
cost |
0.70 | 0.20 | 0.10 | Batch jobs, dev environments |
balanced |
0.33 | 0.34 | 0.33 | Default — production workloads |
performance |
0.10 | 0.80 | 0.10 | Latency-sensitive, GPU inference |
availability |
0.10 | 0.20 | 0.70 | Long-running jobs, deprecation risk |
Hard floor filters run before scoring — instances that don't meet minimum RAM, vCPU, or GPU requirements are eliminated entirely, not just ranked low.
Advanced users can override weights directly:
profile = WorkloadProfile(
vcpu=60,
ram_gb=224,
# Both short and long key spellings are accepted:
# short: {"cost": 0.5, "perf": 0.4, "avail": 0.1}
# long: {"cost": 0.5, "performance": 0.4, "availability": 0.1}
weights={"cost": 0.5, "performance": 0.4, "availability": 0.1}
)
Workload archetypes
cloudfit-core understands five resource archetypes, each reflecting a different dominant constraint:
| Archetype | Dominant constraint | Typical workloads |
|---|---|---|
io |
Disk throughput | Sequencing demultiplexing, short-read alignment |
cpu |
Thread parallelism | Variant calling, de novo assembly, quantification |
mem |
RAM capacity | Metagenomics classification, single-cell RNA-seq, Hi-C |
gpu |
GPU VRAM | Protein structure prediction, GPU variant calling, basecalling |
burst |
Fleet × small instances | Nextflow pipelines, Snakemake DAGs, WDL scatter-gather |
In this release the archetype is recorded on the workload profile for classification and downstream tooling; scoring weights are driven by optimize_for. Archetype-aware weighting and fleet-vs-single-instance recommendations (e.g. many small spot instances for burst) are planned for a future release.
Dynamic disk sizing
For sequencing workloads, disk requirements scale with experiment parameters rather than being fixed. cloudfit-core computes disk from first principles:
from cloudfit import compute_disk_tb, WorkloadProfile, DiskSpec
disk_tb = compute_disk_tb(
sequencer="novaseq_6000",
flowcell="s4",
lanes=4,
retain_input=False, # if True, raw input files are kept post-run
keep_undetermined=False, # if True, unmatched reads written to disk (+8%)
safety_margin=0.20,
)
# → 15.84 TB
# Use the result when building your workload profile
profile = WorkloadProfile(
vcpu=60,
ram_gb=224,
workload="io-intensive",
archetype="io",
disk=DiskSpec(sizing="static", scratch_tb=disk_tb),
)
compute_disk_tb is a standalone helper — call it before constructing your WorkloadProfile and pass the result into DiskSpec.scratch_tb.
Workload YAML schema
workload:
type: io-intensive
archetype: io
parallelism: lane # lane | sample | interval | process | rule
resources:
vcpu: 60
ram_gb: 224
disk:
sizing: dynamic # "dynamic" computes from experiment params; "static" uses scratch_tb
preferred: local_ssd_first
gpu:
required: false
scheduling:
spot: false
restart_tolerant: false
optimize_for: balanced # cost | performance | availability | balanced
providers:
- gcp
- aws
Load from file:
from cloudfit import from_yaml
profile = from_yaml("my-workload.yaml")
results = rank(profile, candidates)
Provider plugins
cloudfit-core is the scoring engine only — it scores whatever instances you give it. Provider plugins fetch live instance data from cloud APIs on a schedule and feed the registry:
pip install cloudfit-provider-gcp # fetches GCP Compute Engine machine types
pip install cloudfit-provider-aws # fetches AWS EC2 instance specs and pricing
Each provider implements a simple interface:
from cloudfit.providers.base import Provider
class MyProvider(Provider):
def fetch_instances(self, region: str) -> list[MachineType]: ...
def get_pricing(self, instance_id: str, region: str) -> float: ...
def get_availability(self, instance_id: str, region: str) -> float: ...
Want to add a provider? See CONTRIBUTING.md.
Terraform / OpenTofu integration
Once cloudfit-api is running, use the Terraform provider to resolve instance types at plan time:
data "cloudfit_recommendation" "demux_worker" {
vcpu = 60
ram_gb = 224
workload = "sequencing-demux"
optimize_for = "balanced"
}
resource "google_compute_instance" "worker" {
machine_type = data.cloudfit_recommendation.demux_worker.machine_type
}
Citing cloudfit-core
If you use cloudfit-core in your research, please cite it:
@software{kasaraneni2026cloudfit,
author = {Kasaraneni, Chaitanya Krishna},
title = {cloudfit-core: Cloud-agnostic machine type scoring engine
for computational workloads},
year = {2026},
publisher = {GitHub},
url = {https://github.com/cloudfit-io/cloudfit-core},
orcid = {0000-0001-5792-1095}
}
GitHub also shows a Cite this repository button in the sidebar (powered by CITATION.cff).
Related publications
- Kasaraneni, C.K. et al. (2025). AI-Driven Drug Repurposing: A Graph Neural Network and Self-Supervised Learning Approach. IEEE CIACON. doi:10.1109/CIACON65473.2025.11189545
- Kasaraneni, C.K. et al. (2025). Multi-modality Medical Image Fusion Using Machine Learning/Deep Learning. Springer. doi:10.1007/978-3-031-98728-1_16
Related projects
samplesheet-parser— Format-agnostic Illumina SampleSheet parser (BCLConvert V2 + IEM V1)clinops— Clinical ML data quality library
Repository structure
cloudfit-core/
├── README.md # first thing every visitor reads
├── CITATION.cff # GitHub "Cite this repository" button — ORCID linked
├── pyproject.toml # packaging, dependencies, PyPI metadata
├── CONTRIBUTING.md # provider plugin interface guide
├── LICENSE # Apache 2.0
├── .gitignore
│
├── cloudfit/
│ ├── __init__.py # exports rank, recommend, key models
│ ├── models.py # WorkloadProfile, MachineType, ScoredInstance (pydantic v2)
│ ├── scorer.py # rank(), score_instance(), weight matrix
│ ├── filter.py # hard_floor_check() — RAM, vCPU, GPU hard filters
│ ├── disk.py # compute_disk_tb() — dynamic disk sizing formula
│ ├── yaml_loader.py # from_yaml() — loads workload YAML schema
│ └── providers/
│ ├── __init__.py
│ └── base.py # abstract Provider class — plugin contract
│
└── tests/
├── test_scorer.py # rank, scores, weight modes, hard floors
├── test_disk.py # disk formula, CBCL vs BCL factor, sequencer profiles
└── test_yaml.py # from_yaml() loads profiles correctly
Contributing
See CONTRIBUTING.md. Issues and pull requests are welcome — especially provider plugins for new cloud platforms (Azure, Hetzner, Oracle Cloud).
License
Apache 2.0 — see LICENSE.
Author: Chaitanya Krishna Kasaraneni · Google Scholar · ORCID 0000-0001-5792-1095
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cloudfit_core-0.1.1.tar.gz.
File metadata
- Download URL: cloudfit_core-0.1.1.tar.gz
- Upload date:
- Size: 21.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60d096e99392ab676a77876c9dcf2c5e78cba585f84712e7007800a4de186511
|
|
| MD5 |
893f15981f9c8468d142cbe68350cb78
|
|
| BLAKE2b-256 |
6037b536cb49d1cc2192876e7800a7d51f9d5eb70656fbfd366726e6de09e217
|
Provenance
The following attestation bundles were made for cloudfit_core-0.1.1.tar.gz:
Publisher:
ci.yml on cloudfit-io/cloudfit-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cloudfit_core-0.1.1.tar.gz -
Subject digest:
60d096e99392ab676a77876c9dcf2c5e78cba585f84712e7007800a4de186511 - Sigstore transparency entry: 1610645498
- Sigstore integration time:
-
Permalink:
cloudfit-io/cloudfit-core@ef6c8023909308bfdeb354cca2a485daae24415a -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/cloudfit-io
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@ef6c8023909308bfdeb354cca2a485daae24415a -
Trigger Event:
push
-
Statement type:
File details
Details for the file cloudfit_core-0.1.1-py3-none-any.whl.
File metadata
- Download URL: cloudfit_core-0.1.1-py3-none-any.whl
- Upload date:
- Size: 18.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c31622554d2be69b5a25e12960c9a9c00b0b150be88e038dceabd8557f617fad
|
|
| MD5 |
de982ea3ecc86d3b091db5d95b390c0b
|
|
| BLAKE2b-256 |
d8f5354125a8e909ed3976fc17fc3ff5adbc6451b7c787e54287d79c9ae4730f
|
Provenance
The following attestation bundles were made for cloudfit_core-0.1.1-py3-none-any.whl:
Publisher:
ci.yml on cloudfit-io/cloudfit-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cloudfit_core-0.1.1-py3-none-any.whl -
Subject digest:
c31622554d2be69b5a25e12960c9a9c00b0b150be88e038dceabd8557f617fad - Sigstore transparency entry: 1610645777
- Sigstore integration time:
-
Permalink:
cloudfit-io/cloudfit-core@ef6c8023909308bfdeb354cca2a485daae24415a -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/cloudfit-io
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@ef6c8023909308bfdeb354cca2a485daae24415a -
Trigger Event:
push
-
Statement type: