Skip to main content

Pipe Dreams: API for publication of scientific data

Project description

🔬 Pipe Dreams

Do you want to:

  • Organize your huge pile of loose scripts ?
  • Create neat and reusable python pipelines to process your data or run jobs ?
  • Have a graph (DAG) based parallelization without too much fuss ?
    Well, you are at the right place. Pipe Dreams is a super duper light application programmer interface (API) to support the construction and processing of data pipes for scientific data. It was built primarily for the Laboratory Catalog and Archive System, but now open-ended for other systems.

How do we do it:

  • We use Python Dictionaries to encapsulate all your intermediate results/data flowing through the pipeline, so you can not only declare and run a sequence of functions but also wire the individual output variables to some specific input parameters. What's more, you can rename, merge and exercise other fine grain control over your intermediate results.
  • We provide a Plugin class that can be subclassed to organize your python functions and then call these using their relative string paths in our framework.
  • We use Celery, Redis, and NetworkX to parallelize your workflows with minimal setup on the users part.

🚗 Starting Redis

The Pipe Dreams API requires Redis to run. To start Redis (assuming Docker in installed), run:

$ docker container run \
    --name labcas-redis \
    --publish 6379:6379 \
    --detach \
    redis:6.2.4-alpine

💿 Installing Pipe Dreams

Pipe Dreams is an open source, installable Python packge. It requires Python 3.7 or later. Typically, you'd install it into Python virtual environment, but you can also put it into a Conda or—if you must—your system's Python.

To use a virtual environment, run:

$ python3 -m venv venv
$ venv/bin/pip install --upgrade setuptools pip wheel
$ venv/bin/pip install jpl.pipedreams
$ source venv/bin/activate  # or use activate.csh or activate.fish as needed

Once this is done, you can run venv/bin/python as your Python interpreter and it will have the Pipe Dreams API (and all its dependencies) ready for use. Note that the activate step, although deprecated, is still necessary in order to have the celery program on your execution path.

👩‍💻 Customizing the Workflow

The next step is to create a workflow to define the processing steps to publish the data. As an example, see the demo/demo.py which is available from the GitHub release of this package.

In summary you need to

  1. Create an Operation instance.
  2. Add pipes (a sequence of named functions) to the instance.
  3. Run the operation in either single or multi process(es).

📗 Process Your Data Pipes

Finally, with Redis running and a custom workflow defined, you can then execute your pipeline.

As an example, we provide a demonstration workflow and associated test data. You can run it (assuming you've got the virtual Python environment from above) as follows:

$ curl -LO https://github.com/EDRN/jpl.pipedreams/releases/download/v1.0.2/demo.tar.gz | tar xzf -
$ cd demo
$ ../venv/bin/pip install --requirement requirements.txt
$ ../venv/bin/python demo.py
Adding Node: hello_world_read|+|mydata0.txt

num nodes in task graph: 7
num task completed: 7
time taken: 0:00:00.NNNNN

That's it 🥳

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jpl.pipedreams-1.0.2.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jpl.pipedreams-1.0.2-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file jpl.pipedreams-1.0.2.tar.gz.

File metadata

  • Download URL: jpl.pipedreams-1.0.2.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/56.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.5

File hashes

Hashes for jpl.pipedreams-1.0.2.tar.gz
Algorithm Hash digest
SHA256 d15d4e8927ee6272f868e8661306c09172ff9440c7e9add64a3f4c3656ad0134
MD5 d3861e5c3bfef8c23d943169c03af5c9
BLAKE2b-256 4551c4c86409236fdaec72459d69e42c825443d3d2193ffd369144ac54673b98

See more details on using hashes here.

File details

Details for the file jpl.pipedreams-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: jpl.pipedreams-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/56.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.5

File hashes

Hashes for jpl.pipedreams-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ff332dba2667b67755ee2b83570032361ac6ecc2dc5a684145b3b2296583eacf
MD5 c100ae5d90aaff7dbc877d29de1e3107
BLAKE2b-256 1c2eef438c7c256598b51572470b07dd229f6285ffccc0cc5709094de10b32bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page