Skip to main content

Cloud-agnostic machine type scoring engine for computational workloads

Project description

cloudfit-core

PyPI version Python 3.9+ License: Apache 2.0 Tests

Cloud-agnostic machine type scoring engine for computational workloads.

cloudfit-core is the foundation of the cloudfit ecosystem — a pure Python library that, given a workload profile, scores and ranks available cloud instances across providers. No cloud credentials required. No API calls. Just a workload spec in, ranked recommendations out.


The problem

Teams hardcode instance types (c2-standard-60, c7i.16xlarge) in infrastructure-as-code. When providers deprecate them or release better generations, nothing updates — costs drift and performance degrades silently. There is no open-source tool that takes a workload description and returns the best available instance across AWS, GCP, and Azure with explainable scoring.

cloudfit-core is that scoring engine.


Installation

pip install cloudfit-core

Requires Python 3.9+.


Quick start

from cloudfit import WorkloadProfile, MachineType, rank

# Define your workload
profile = WorkloadProfile(
    vcpu=60,
    ram_gb=224,
    workload="io-intensive",
    archetype="io",            # io | cpu | mem | gpu | burst
    optimize_for="balanced",   # cost | performance | availability | balanced
)

# Provide candidate instances (from a cloudfit-provider-* package or your own list)
candidates = [
    MachineType(id="c2-standard-60",       provider="gcp", vcpu=60, ram_gb=240, price_hr=3.13),
    MachineType(id="c3d-standard-60-lssd", provider="gcp", vcpu=60, ram_gb=240, price_hr=3.39),
    MachineType(id="t2d-standard-60",      provider="gcp", vcpu=60, ram_gb=240, price_hr=2.31),
    MachineType(id="c7i.24xlarge",         provider="aws", vcpu=96, ram_gb=192, price_hr=4.28),
]

# Score and rank
results = rank(profile, candidates)
for r in results:
    print(f"{r.instance.id:30s}  score={r.score:.2f}  ${r.instance.price_hr:.2f}/hr")

Output:

t2d-standard-60                 score=0.81  $2.31/hr
c2-standard-60                  score=0.81  $3.13/hr
c3d-standard-60-lssd            score=0.80  $3.39/hr
c7i.24xlarge                    score=0.00  $4.28/hr

c7i.24xlarge scores 0.00 and ranks last because its 192 GB RAM is below the requested 224 GB — it's eliminated by the hard floor filter, not just ranked low (see How scoring works).


How scoring works

Every recommendation runs through the same weighted scoring function:

score = w_cost × cost_score + w_perf × perf_score + w_avail × avail_score

The optimize_for mode sets the weights:

Mode w_cost w_perf w_avail Best for
cost 0.70 0.20 0.10 Batch jobs, dev environments
balanced 0.33 0.34 0.33 Default — production workloads
performance 0.10 0.80 0.10 Latency-sensitive, GPU inference
availability 0.10 0.20 0.70 Long-running jobs, deprecation risk

Hard floor filters run before scoring — instances that don't meet minimum RAM, vCPU, or GPU requirements are eliminated entirely, not just ranked low.

Advanced users can override weights directly:

profile = WorkloadProfile(
    vcpu=60,
    ram_gb=224,
    # Both short and long key spellings are accepted:
    # short: {"cost": 0.5, "perf": 0.4, "avail": 0.1}
    # long:  {"cost": 0.5, "performance": 0.4, "availability": 0.1}
    weights={"cost": 0.5, "performance": 0.4, "availability": 0.1}
)

Workload archetypes

cloudfit-core understands five resource archetypes, each reflecting a different dominant constraint:

Archetype Dominant constraint Typical workloads
io Disk throughput Sequencing demultiplexing, short-read alignment
cpu Thread parallelism Variant calling, de novo assembly, quantification
mem RAM capacity Metagenomics classification, single-cell RNA-seq, Hi-C
gpu GPU VRAM Protein structure prediction, GPU variant calling, basecalling
burst Fleet × small instances Nextflow pipelines, Snakemake DAGs, WDL scatter-gather

In this release the archetype is recorded on the workload profile for classification and downstream tooling; scoring weights are driven by optimize_for. Archetype-aware weighting and fleet-vs-single-instance recommendations (e.g. many small spot instances for burst) are planned for a future release.


Dynamic disk sizing

For sequencing workloads, disk requirements scale with experiment parameters rather than being fixed. cloudfit-core computes disk from first principles:

from cloudfit import compute_disk_tb, WorkloadProfile, DiskSpec

disk_tb = compute_disk_tb(
    sequencer="novaseq_6000",
    flowcell="s4",
    lanes=4,
    retain_input=False,        # if True, raw input files are kept post-run
    keep_undetermined=False,   # if True, unmatched reads written to disk (+8%)
    safety_margin=0.20,
)
# → 15.84 TB

# Use the result when building your workload profile
profile = WorkloadProfile(
    vcpu=60,
    ram_gb=224,
    workload="io-intensive",
    archetype="io",
    disk=DiskSpec(sizing="static", scratch_tb=disk_tb),
)

compute_disk_tb is a standalone helper — call it before constructing your WorkloadProfile and pass the result into DiskSpec.scratch_tb.


Workload YAML schema

workload:
  type: io-intensive
  archetype: io
  parallelism: lane        # lane | sample | interval | process | rule

  resources:
    vcpu: 60
    ram_gb: 224
    disk:
      sizing: dynamic      # "dynamic" computes from experiment params; "static" uses scratch_tb
      preferred: local_ssd_first
    gpu:
      required: false

  scheduling:
    spot: false
    restart_tolerant: false

  optimize_for: balanced   # cost | performance | availability | balanced
  providers:
    - gcp
    - aws

Load from file:

from cloudfit import from_yaml

profile = from_yaml("my-workload.yaml")
results = rank(profile, candidates)

Provider plugins

cloudfit-core is the scoring engine only — it scores whatever instances you give it. Provider plugins fetch live instance data from cloud APIs on a schedule and feed the registry:

pip install cloudfit-provider-gcp   # fetches GCP Compute Engine machine types
pip install cloudfit-provider-aws   # fetches AWS EC2 instance specs and pricing

Each provider implements a simple interface:

from cloudfit.providers.base import Provider

class MyProvider(Provider):
    def fetch_instances(self, region: str) -> list[MachineType]: ...
    def get_pricing(self, instance_id: str, region: str) -> float: ...
    def get_availability(self, instance_id: str, region: str) -> float: ...

Want to add a provider? See CONTRIBUTING.md.


Terraform / OpenTofu integration

Once cloudfit-api is running, use the Terraform provider to resolve instance types at plan time:

data "cloudfit_recommendation" "demux_worker" {
  vcpu         = 60
  ram_gb       = 224
  workload     = "sequencing-demux"
  optimize_for = "balanced"
}

resource "google_compute_instance" "worker" {
  machine_type = data.cloudfit_recommendation.demux_worker.machine_type
}

Citing cloudfit-core

If you use cloudfit-core in your research, please cite it:

@software{kasaraneni2026cloudfit,
  author    = {Kasaraneni, Chaitanya Krishna},
  title     = {cloudfit-core: Cloud-agnostic machine type scoring engine
               for computational workloads},
  year      = {2026},
  publisher = {GitHub},
  url       = {https://github.com/cloudfit-io/cloudfit-core},
  orcid     = {0000-0001-5792-1095}
}

GitHub also shows a Cite this repository button in the sidebar (powered by CITATION.cff).


Related publications


Related projects

  • samplesheet-parser — Format-agnostic Illumina SampleSheet parser (BCLConvert V2 + IEM V1)
  • clinops — Clinical ML data quality library

Repository structure

cloudfit-core/
├── README.md               # first thing every visitor reads
├── CITATION.cff            # GitHub "Cite this repository" button — ORCID linked
├── pyproject.toml          # packaging, dependencies, PyPI metadata
├── CONTRIBUTING.md         # provider plugin interface guide
├── LICENSE                 # Apache 2.0
├── .gitignore
│
├── cloudfit/
│   ├── __init__.py         # exports rank, recommend, key models
│   ├── models.py           # WorkloadProfile, MachineType, ScoredInstance (pydantic v2)
│   ├── scorer.py           # rank(), score_instance(), weight matrix
│   ├── filter.py           # hard_floor_check() — RAM, vCPU, GPU hard filters
│   ├── disk.py             # compute_disk_tb() — dynamic disk sizing formula
│   ├── yaml_loader.py      # from_yaml() — loads workload YAML schema
│   └── providers/
│       ├── __init__.py
│       └── base.py         # abstract Provider class — plugin contract
│
└── tests/
    ├── test_scorer.py      # rank, scores, weight modes, hard floors
    ├── test_disk.py        # disk formula, CBCL vs BCL factor, sequencer profiles
    └── test_yaml.py        # from_yaml() loads profiles correctly

Contributing

See CONTRIBUTING.md. Issues and pull requests are welcome — especially provider plugins for new cloud platforms (Azure, Hetzner, Oracle Cloud).

License

Apache 2.0 — see LICENSE.


Author: Chaitanya Krishna Kasaraneni  ·  Google Scholar  ·  ORCID 0000-0001-5792-1095

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloudfit_core-0.1.2.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cloudfit_core-0.1.2-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file cloudfit_core-0.1.2.tar.gz.

File metadata

  • Download URL: cloudfit_core-0.1.2.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cloudfit_core-0.1.2.tar.gz
Algorithm Hash digest
SHA256 6ca4e6fd833ac316f1517b10d34be61d333f0faac65d54928d6435e59ba768f9
MD5 a08389b5ea688315bf038e2443bc647f
BLAKE2b-256 7b596b42ad3fa0b1127ce205b7891ebd9f4b3c9053bd83baee8c54964174a9ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for cloudfit_core-0.1.2.tar.gz:

Publisher: ci.yml on cloudfit-io/cloudfit-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cloudfit_core-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: cloudfit_core-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cloudfit_core-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9af3b302055ea7ae06d62618664b5226f8f9379b0f440e6208a8f88a6183677c
MD5 17020862a8969babb7f26f9046b2eb4f
BLAKE2b-256 e991b1fe5944aacb9bcb0db8b087f87451ee5a7bbf15d8549964301d74128515

See more details on using hashes here.

Provenance

The following attestation bundles were made for cloudfit_core-0.1.2-py3-none-any.whl:

Publisher: ci.yml on cloudfit-io/cloudfit-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page