Skip to main content

Caching Workflow Engine

Project description

CacheFlow is a caching workflow engine, capable of executing dataflows while reusing previous results where appropriate, for efficiency. It is very extensible and can be used in many projects.


  • ☑ Python 3 workflow system
  • ☑ Executes dataflows from JSON files
  • ☐ Can also load from SQL database
  • ☐ Parallel execution
  • ☐ Streaming
  • ☑ Extensible: can add new modules, new storage formats, new caching mechanism, new executors
  • ☐ Pluggable: extensions can be installed from PyPI without forking
  • ☑ Re-usable: can execute workflows by itself, but can also be embedded into applications. Some I plan on developing myself:
    • Literate programming app: snippets or modules embedded into a markdown file, which are executed on render (similar to Rmarkdown). Results would be cached, making later rendering fast
    • Integrate in some of my NYU research projects (VisTrails Vizier, D3M)

Other ideas:

  • ☐ Use Jupyter kernels as backends to execute code (giving me quick access to all the languages they support)
  • ☐ Isolate script execution (to run untrusted Python/… code, for example with Docker)


  • Make a super-scalable and fast workflow execution engine: I’d rather make executors based on Spark, Dask, Ray than re-implement those


Basic structures are here, extracted from D3M. Execution works.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
cacheflow-0.1-py3-none-any.whl (11.9 kB) Copy SHA256 hash SHA256 Wheel py3
cacheflow-0.1.tar.gz (8.3 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page