This is a repository for managing embarassingly parallel experiments with Hatchet.

Project description

Scythe

Scythe is a lightweight tool which helps you seed and reap (scatter and gather) emabarassingly parallel experiments via the asynchronous distributed queue Hatchet.

The project is still in the VERY EARLY stages, and will likely evolve quite a bit in the second half of 2025.

Github repository: https://github.com/szvsw/scythe/
Documentation https://szvsw.github.io/scythe/

Motivation

In my experience helping colleagues with their research projects, academic researchers and engineers often have the ability to define their experiments via input and output specs fairly well and would love to run at large scales, but often get limited by a lack of experience with distributed computing techniques, eg. artifact infil- and exfiltration, handling errors, interacting with supercomputing schedulers, dealing with cloud infrastructure, etc.

The goal of Scythe is to abstract away some of these details to let researchers focus on what they are familiar with (i.e. writing consistent input and output schemas and the computation logic that transforms data from inputs into outputs) while automating the boring but necessary work to run millions of simulations (e.g. serializing data to and from cloud buckets, configuring queues, etc).

There are of course lots of data engineering orchestration tools out there already, but this is a bit more lightweight and hopefully a little simpler to use, at the expense of fewer bells and whistles (for now) like robust dataset lineage, etc.

Hatchet is already very easy (and fun!) to use for newcomers to distributed computing, so I recommend checking out their docs - you might be better off simply directly running Hatchet! Scythe is just a lightweight modular layer on top of it which is really tailored to the use case of generating large datasets of consistently structured experiment inputs and outputs. Another option you might check out would be something like Coiled + Dask.

Installation

With uv:

uv add scythe-engine

With poetry:

uv add scythe-engine

With pip:

pip install scythe-engine

Documentation

Coming soon...

However, in the meantime, check out the example project to get an idea of what using Scythe with Hatchet looks like.

Example

Scythe is useful for running many parallel simulations with a common I/O interface. It abstracts away the logic of issuing simulations and combining results into well-structured dataframes and parquet files.

In this example, we will demonstrate setting up a building energy simulation so we can create a dataset of energy modeling results for use in training a surrogate model.

To begin, we start by defining the schema of the inputs and outputs. The inputs will ultimately be converted into dataframes (where the defined input fields are columns). Similarly, the output schema fields will be used as columns of results dataframes (and the input dataframe will actualy be used as a MultiIndex). Note that FileReference inputs which are of type Path will automatically be uploaded to S3 and re-referenced.

from pydantic import Field
from scythe.base import ExperimentInputSpec, ExperimentOutputSpec, FileReference

class BuildingSimulationInput(ExperimentInputSpec):
    """Simulation inputs for a building energy model."""

    r_value: float = Field(default=..., description="The R-Value of the building [m2K/W]", ge=0, le=15)
    lpd: float = Field(default=..., description="Lighting power density [W/m2]", ge=0, le=20)
    setpoint: float = Field(default=..., description="Thermostat setpoint [deg.C]", ge=12, le=30)
    weather_file: FileReference = Field(default=..., description="Weather file [.epw]")


class BuildingSimulationOutput(ExperimentOutputSpec):
    """Simulation outputs for a building energy model."""

    heating: float = Field(default=... description="Annual heating energy usage, kWh/m2", ge=0)
    cooling: float = Field(default=... description="Annual cooling energy usage, kWh/m2", ge=0)
    lighting: float = Field(default=... description="Annual lighting energy usage, kWh/m2", ge=0)
    equipment: float = Field(default=... description="Annual equipment energy usage, kWh/m2", ge=0)
    fans: float = Field(default=... description="Annual fans energy usage, kWh/m2", ge=0)
    pumps: float = Field(default=... description="Annual pumps energy usage, kWh/m2", ge=0)

The schemas above will be exported into your results bucket as experiment_io_spec.yaml including any docstrings and descriptions.

nb: you can also add your own dataframes to the outputs, e.g. for non-scalar values like timeseries and so on. documentation coming soon.

Next, we define the actual simulation logic. We will decorate the simulation function with an indicator that it should be a part of our ExperimentRegistry, which configures all of the fancy scatter/gather logic. Note that the function can only take a single argument (the schema defined previously) and can only return a single output instance of the previously defined output schema (though additional dataframes can be stored in the dataframes field inherited from the base ExperimentOutputSpec.).

from scythe.registry import ExperimentRegistry

@ExperimentRegistry.Register()
def simulate(input_spec: BuildingSimulationInput) -> BuildingSimulationOutput:
    """Initialize and execute an energy model of a building."""

    # do some work!
    ...

    return BuildingSimulationOutput(
        heating=...,
        cooling=...,
        lighting=...,
        equipment=...,
        fans=...,
        pumps=...
        dataframes=...,
    )

Since BuildingSimulationInput inherited from ExperimentInputSpec, some methods automatically exist on the class, e.g. log for writing messages to the worker logs, or methods for fetching common artifact files from remote resources like S3 or a web request into a cacheable filesystem.

TODO: document artifact fetching, writing artifacts per experiment

TODO: document allocating experiments, infra

After the experiment is finished running all tasks, it will automatically produce an output file scalars.pq with all of the results defined on your schema for each of the individual simulations that were executed.

The index of the dataframe will itself be a dataframe with the input specs and some additional metadata, e.g:

MultiIndex

experiment_id	sort_index	root_workflow_run_id	r_value	lpd	setpoint
bem/v0	0	abcd-efgh	5.2	2.7	23.5
bem/v0	1	abcd-efgh	2.9	1.3	19.7
bem/v0	2	abcd-efgh	4.2	5.4	21.4

Data

heating	cooling	lighting	equipment	fans	pumps
17.2	15.3	10.1	13.8	14.2	1.4
21.7	5.4	9.2	5.8	10.3	2.0
19.5	8.9	12.5	13.7	8.9	0.9

TODO: document how additional dataframes of results are handled.

To-dos (help wanted!)

Start documenting
ExperimentRun class
write experiment_io_spec.yaml to bucket
add method for writing dataframes unique to task run to bucket
Results downloaders
Automatic local artifact conversion to cloud artifacts

Project details

Release history Release notifications | RSS feed

1.3.0

Apr 10, 2026

1.2.0

Mar 28, 2026

1.1.0

Mar 28, 2026

1.0.0

Mar 2, 2026

0.1.2

Feb 12, 2026

0.1.1

Feb 11, 2026

0.1.0

Feb 11, 2026

0.0.43

Feb 11, 2026

0.0.41

Nov 19, 2025

0.0.40

Oct 30, 2025

0.0.39

Oct 30, 2025

0.0.38

Aug 11, 2025

0.0.38b0 pre-release

Aug 12, 2025

0.0.38a0 pre-release

Aug 12, 2025

0.0.37

Aug 11, 2025

0.0.36

Aug 11, 2025

0.0.35

Aug 10, 2025

0.0.35a0 pre-release

Aug 10, 2025

0.0.34

Aug 7, 2025

0.0.33

Aug 7, 2025

0.0.32

Aug 7, 2025

0.0.31

Aug 6, 2025

0.0.30

Aug 6, 2025

0.0.30b0 pre-release

Aug 6, 2025

0.0.30a0 pre-release

Aug 6, 2025

0.0.29

Aug 6, 2025

0.0.28

Jul 23, 2025

0.0.28a0 pre-release

Jul 23, 2025

0.0.27

Jul 23, 2025

0.0.26

Jul 23, 2025

0.0.25

Jul 23, 2025

0.0.24

Jul 23, 2025

0.0.23

Jul 23, 2025

0.0.22

Jul 23, 2025

0.0.21

Jul 23, 2025

This version

0.0.20

Jul 23, 2025

0.0.18

Jul 22, 2025

0.0.17

Jul 22, 2025

0.0.16

Jul 22, 2025

0.0.15

Jul 22, 2025

0.0.13

Jul 21, 2025

0.0.12

Jul 21, 2025

0.0.11

Jul 21, 2025

0.0.10

Jul 21, 2025

0.0.9

Jul 21, 2025

0.0.8

Jul 21, 2025

0.0.7

Jul 21, 2025

0.0.6

Jul 21, 2025

0.0.5

Jul 21, 2025

0.0.4

Jul 21, 2025

0.0.3

Jul 21, 2025

0.0.2

Jul 21, 2025

0.0.1

Jul 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scythe_engine-0.0.20.tar.gz (188.8 kB view details)

Uploaded Jul 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scythe_engine-0.0.20-py3-none-any.whl (21.6 kB view details)

Uploaded Jul 23, 2025 Python 3

File details

Details for the file scythe_engine-0.0.20.tar.gz.

File metadata

Download URL: scythe_engine-0.0.20.tar.gz
Upload date: Jul 23, 2025
Size: 188.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.14

File hashes

Hashes for scythe_engine-0.0.20.tar.gz
Algorithm	Hash digest
SHA256	`1e6a7f70543c907b60ca4bfec8270584f25c450a1bac2299b62caf3c3b0ef07c`
MD5	`742e9bb5b9bba1bf58c84c2a28b52bdc`
BLAKE2b-256	`80e250c112f54eebc40d1a7e3c6dbf4a6170adb6907fe4f9f83752a6bf4fdc79`

See more details on using hashes here.

File details

Details for the file scythe_engine-0.0.20-py3-none-any.whl.

File metadata

Download URL: scythe_engine-0.0.20-py3-none-any.whl
Upload date: Jul 23, 2025
Size: 21.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.14

File hashes

Hashes for scythe_engine-0.0.20-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ddd26f3f3d5447f36ac98d84b8d48a9d0cb7345a9a4113b80bdcb50370c7df49`
MD5	`111d100d51cb7ada5e0be2b0b08166ed`
BLAKE2b-256	`a4642d8541890d0a83d20913b615216adf355f0adade26cacd428d58646647a4`

See more details on using hashes here.

scythe-engine 0.0.20

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Scythe

Motivation

Installation

Documentation

Example

To-dos (help wanted!)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes