pipewelder

Scheduled task execution on top of AWS Data Pipeline

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Operating System
- OS Independent
Programming Language
Topic
- Software Development :: Libraries :: Python Modules

Project description

Pipewelder is a framework that provides a command-line tool and Python API to manage AWS Data Pipeline jobs from flat files. Simple uses it as a cron-like job scheduler.

Source: https://github.com/SimpleFinance/pipewelder
Documentation: http://pipewelder.readthedocs.org
PyPI: https://pypi.python.org/pypi/pipewelder

Overview

Pipewelder aims to ease the task of scheduling jobs by defining very simple pipelines which are little more than an execution schedule, offloading most of the execution logic to files in S3. Pipewelder uses Data Pipeline’s concept of data staging to pull input files from S3 at the beginning of execution and to upload output files back to S3 at the end of execution.

If you follow Pipewelder’s directory structure, all of your pipeline logic can live in version-controlled flat files. The included command-line interface gives you simple commands to validate your pipeline definitions, upload task definitions to S3, and activate your pipelines.

Installation

Pipewelder is available from PyPI via pip and is compatible with Python 2.6, 2.7, 3.3, and 3.4:

pip install pipewelder

The easiest way to get started is to clone the project from GitHub, copy the example project from Pipewelder’s tests, and then modify to suit:

git clone https://github.com/SimpleFinance/pipewelder.git
cp -r pipewelder/tests/test_data my-pipewelder-project

If you’re setting up Pipewelder and need help, feel free to email the author.

Development

To do development on Pipewelder, clone the repository and run make to install dependencies and run tests.

Directory Structure

To use Pipewelder, you provide a template pipeline definition along with one or more directories that correspond to particular pipeline instances. The directory structure looks like this (see test_data for a working example):

pipeline_definition.json
pipewelder.json <- optional configuration file
my_first_pipeline/
    run
    values.json
    tasks/
        task1.sh
        task2.sh
my_second_pipeline/
...

The values.json file in each pipeline directory specifies parameter values that are used modify the template definition including the S3 paths for inputs, outputs, and logs. Some of these values are used directly by Pipewelder as well.

A `ShellCommandActivity <http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-shellcommandactivity.html>`__ in the template definition simply looks for an executable file named run and executes it. run is the entry point for whatever work you want your pipeline to do.

Often, your run executable will be a wrapper script to execute a variety of similar tasks. When that’s the case, use the tasks subdirectory to hold these definitions. These tasks could be text files, shell scripts, SQL code, or whatever else your run file expects. Pipewelder gives tasks folder special treatment in that the CLI will make sure to remove existing task definitions when uploading files.

Using the Command-Line Interface

The Pipewelder CLI should always be invoked from the top-level directory of your definitions (the directory where pipeline_definition.json lives). If your directory structure matches Pipewelder’s expectations, it should work without further configuration.

As you make changes to your template definition or values.json files, it can be useful to check whether AWS considers your definitions valid:

$ pipewelder validate

Once you’ve defined your pipelines, you’ll need to upload the files to S3:

$ pipewelder upload

Finally, activate your pipelines:

$ pipewelder activate

Any time you change the values.json or pipeline_definition.json, you’ll need to run the activate subcommand again. Because active pipelines can’t be modified, the activate command will delete the existing pipeline and create a new one in its place. The run history for the previous pipeline will be discarded.

Acknowledgments

Pipewelder’s package structure is based on python-project-template.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Operating System
- OS Independent
Programming Language
Topic
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

This version

0.1.4

Mar 9, 2015

0.1.2

Feb 26, 2015

0.1.1

Feb 26, 2015

0.1

Feb 26, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipewelder-0.1.4.tar.gz (357.6 kB view details)

Uploaded Mar 9, 2015 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pipewelder-0.1.4-py2.py3-none-any.whl (17.8 kB view details)

Uploaded Mar 9, 2015 Python 2Python 3

File details

Details for the file pipewelder-0.1.4.tar.gz.

File metadata

Download URL: pipewelder-0.1.4.tar.gz
Upload date: Mar 9, 2015
Size: 357.6 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for pipewelder-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`480936b6e08b4cd1a628b2bb9a129c5a9c60f558ea74a0f71cde661b1f120541`
MD5	`ebdacf6ccca534a91cd3855cf26aa76f`
BLAKE2b-256	`cf07700280dd6a5880c4f3e52aaf855f3268d5318f7d6bd0ef992c3dffd4a68c`

See more details on using hashes here.

File details

Details for the file pipewelder-0.1.4-py2.py3-none-any.whl.

File metadata

Download URL: pipewelder-0.1.4-py2.py3-none-any.whl
Upload date: Mar 9, 2015
Size: 17.8 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for pipewelder-0.1.4-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`27143e50afb0b1487fd552eeb7e2076724211a38854b5b2cbde41f28555c603a`
MD5	`659f84aacd38692b28222b142bfe2a26`
BLAKE2b-256	`16ce0bf8ef9549372a741aa9c469b6fd57048fdb8e86a04a3ec297efe5bb0c9e`

See more details on using hashes here.

pipewelder 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Overview

Installation

Development

Directory Structure

Using the Command-Line Interface

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes