Skip to main content

Execution and caching tool for python

Project description

Exca - ⚔

Execute and cache seamlessly in python.

workflow badge

Quick install

pip install exca

Full documentation

Documentation is available at https://facebookresearch.github.io/exca/

Basic overview

exca provides simple decorators to:

  • execute a (hierarchy of) computation(s) either locally or on distant nodes,
  • cache the result.

The problem:

In ML pipelines, the use of a simple python function, such as my_task:

import numpy as np

def my_task(param: int = 12) -> float:
    return param * np.random.rand()

often requires cumbersome overheads to (1) configure the parameters, (2) submit the job on a cluster, (3) cache the results: e.g.

import pickle
from pathlib import Path
import submitit

# Configure
param = 12

# Check task has already been executed
filepath = tmp_path / f'result-{param}.npy'
if not filepath.exists():

    # Submit job on cluster
    executor = submitit.AutoExecutor(cluster=None, folder=tmp_path)
    job = executor.submit(my_task, param)
    result = job.result()

    # Cache result
    with filepath.open("wb") as f:
        pickle.dump(result, f)

These overheads lead to several issues, such as debugging, handling hierarchical execution and properly saving the results (ending in the classic 'result-parm12-v2_final_FIX.npy').

The solution:

exca can be used to decorate the method of a pydantic model so as to seamlessly configure its execution and caching:

import numpy as np
import pydantic
import exca as xk

class MyTask(pydantic.BaseModel):
    param: int = 12
    infra: xk.TaskInfra = xk.TaskInfra()

    @infra.apply
    def process(self) -> float:
        return self.param * np.random.rand()


task = MyTask(param=1, infra={"folder": tmp_path, "cluster": "auto"})
out = task.process()  # runs on slurm if available
# calling process again will load the cache and not a new random number
assert out == task.process()

See the API reference for all the details

Quick comparison

feature \ tool lru_cache hydra submitit exca
RAM cache
file cache
remote compute
pure python (vs command line)
hierarchical config

Contributing

See the CONTRIBUTING file for how to help out.

Citing

@misc{exca,
    author = {J. Rapin and J.-R. King},
    title = {{Exca - Execution and caching}},
    year = {2024},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/facebookresearch/exca}},
}

License

exca is MIT licensed, as found in the LICENSE file. Also check-out Meta Open Source Terms of Use and Privacy Policy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exca-0.5.19.tar.gz (126.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

exca-0.5.19-py3-none-any.whl (153.2 kB view details)

Uploaded Python 3

File details

Details for the file exca-0.5.19.tar.gz.

File metadata

  • Download URL: exca-0.5.19.tar.gz
  • Upload date:
  • Size: 126.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for exca-0.5.19.tar.gz
Algorithm Hash digest
SHA256 da0d2f4eeec00fae5b4590309d1de018f313d0cf2c79be8fea3e6a4b316f9002
MD5 1196bc35b222f1088417fcea20e0a3c2
BLAKE2b-256 e2670face97cc06d4f951acf9fb252c301bb3b0cb35a5b6f01eee75eae223d1f

See more details on using hashes here.

File details

Details for the file exca-0.5.19-py3-none-any.whl.

File metadata

  • Download URL: exca-0.5.19-py3-none-any.whl
  • Upload date:
  • Size: 153.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for exca-0.5.19-py3-none-any.whl
Algorithm Hash digest
SHA256 fa9e31cb0a4aee72488c8696a4ba2b6491fd79f89370ed2a1f45d9f9ac31b681
MD5 4d4f01821fcfb9eb6e6d19c57d8fcedb
BLAKE2b-256 0cad4fbc5e7e52e17247078c754a6e6a4722a62813ffd2c5c3565b68226e91e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page