Skip to main content

This is a repository for managing embarassingly parallel experiments with Hatchet.

Project description

Scythe

Release Build status codecov Commit activity License

Scythe is a lightweight tool which helps you seed and reap (scatter and gather) emabarassingly parallel experiments via the asynchronous distributed queue Hatchet.

The project is still in the VERY EARLY stages, and will likely evolve quite a bit in the second half of 2025.

Motivation

In my experience helping colleagues with their research projects, academic researchers and engineers often have the ability to define their experiments via input and output specs fairly well and would love to run at large scales, but often get limited by a lack of experience with distributed computing techniques, eg. artifact infil- and exfiltration, handling errors, interacting with supercomputing schedulers, dealing with cloud infrastructure, etc.

The goal of Scythe is to abstract away some of these details to let researchers focus on what they are familiar with (i.e. writing consistent input and output schemas and the computation logic that transforms data from inputs into outputs) while automating the boring but necessary work to run millions of simulations (e.g. serializing data to and from cloud buckets, configuring queues, etc).

There are of course lots of data engineering orchestration tools out there already, but this is a bit more lightweight and hopefully a little simpler to use, at the expense of less things like (for now) not robustly tracking data lineage, etc. Somet

Hatchet is already very easy (and fun!) to use for newcomers to distributed computing, so I recommend checking out their docs - you might be better off simply directly running Hatchet! Scythe is just a lightweight modular layer on top of it which is really tailored to the use case of generating large datasets of consistently structured experiment inputs and outputs. Another option you might check out would be something like Coiled + Dask.

Documentation

Coming soon...

However, in the meantime, check out the example project to get an idea of what using Scythe with Hatchet looks like.

To-dos (help wanted!)

  • Start documenting
  • ExperimentRun class
  • Results downloaders
  • More extensible results patterns
  • Automatic local artifact conversion to cloud artifacts

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scythe_engine-0.0.17.tar.gz (186.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scythe_engine-0.0.17-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file scythe_engine-0.0.17.tar.gz.

File metadata

  • Download URL: scythe_engine-0.0.17.tar.gz
  • Upload date:
  • Size: 186.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for scythe_engine-0.0.17.tar.gz
Algorithm Hash digest
SHA256 0e423acb15091c4f4b40c23f5cf148dd05a88d47b282a90f633e2a510d0bd71d
MD5 f101af444f697e4bc0e7fc159c8d1637
BLAKE2b-256 522c5783dc196a20323bfc2103f54c043434978c508f08ba652ab5827468d40c

See more details on using hashes here.

File details

Details for the file scythe_engine-0.0.17-py3-none-any.whl.

File metadata

File hashes

Hashes for scythe_engine-0.0.17-py3-none-any.whl
Algorithm Hash digest
SHA256 66d75165f35db164d651d674868206c06db58e421cec74bc976ac9dc62cf22a5
MD5 f4a877e3157bf0089055fbad742b8f8e
BLAKE2b-256 2a8e09ab73b8537850dc41be9eac8325c2c20e5ef468cc3ad4731c29055d84b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page