Skip to main content

Spend your time discovering insights from data, not writing plumbing code. Declare your pipeline in a short YAML file and Ploomber will take care of the rest.

Project description

Ploomber

https://travis-ci.org/ploomber/ploomber.svg?branch=master Documentation Status https://mybinder.org/badge_logo.svg https://badge.fury.io/py/ploomber.svg

List your pipeline tasks in a pipeline.yaml file, declare which upstream tasks to use as inputs and where to save outputs. Ploomber will propagate inputs to downstream consumers and orchestrate pipeline end-to-end execution.

You can open your Python scripts as Jupyter notebooks for development. During execution, they are converted to notebooks to embed tables and charts:

doc/_static/diagrams/python/diag.png

1 minute video:

https://asciinema.org/a/346484.svg

SQL pipelines are also supported:

doc/_static/diagrams/sql/diag.png

(you can even mix SQL and Python in the same pipeline)

Ploomber also keeps track of source code changes to speed up builds by skipping up-to-date tasks. This is a great way to interactively develop your projects, sync work with your team and quickly recover from crashes (just fix the bug and build again).

Try out the live demo (no installation required).

Click here for documentation.

Our blog.

Works with Python 3.5 and higher.

Installation

pip install ploomber

To install Ploomber along with all optional dependencies:

pip install "ploomber[all]"

graphviz is required for plotting pipelines:

# if you use conda (recommended)
conda install graphviz
# if you use homebrew
brew install graphviz
# for more options, see: https://www.graphviz.org/download/

Create a new project

ploomber new

Python API

There is also a Python API for advanced use cases. This API allows you build flexible abstractions such as dynamic pipelines, where the exact number of tasks is determined by its parameters.

CHANGELOG

0.6.2 (2020-07-22)

  • Support for env.yaml in pipeline.yaml

  • Improved CLI. Adds plot, report and task commands

0.6.1 (2020-07-20)

  • Changes pipeline.yaml default (extract_product: True)

  • Documentation re-design

  • Simplified “ploomber new” generated files

  • Ability to define “product” in SQL scripts

  • Products are resolved to absolute paths to avoid ambiguity

  • Bug fixes

0.6 (2020-07-08)

  • Adds Jupyter notebook extension to inject parameters when opening a task

  • Improved CLI ploombe new, ploombe add and ploombe entry

  • Spec API documentation additions

  • Support for on_finish, on_failure and on_render hooks in spec API

  • Improved validation for DAG specs

  • Several bug fixes

0.5.1 (2020-06-30)

  • Reduces the number of required dependencies

  • A new option in DBAPIClient to split source with a custom separator

0.5 (2020-06-27)

  • Adds CLI

  • New spec API to instantiate DAGs using YAML files

  • NotebookRunner.debug() for debugging and .develop() for interacive development

  • Bug fixes

0.4.1 (2020-05-19)

  • PythonCallable.debug() now works in Jupyter notebooks

0.4.0 (2020-05-18)

  • PythonCallable.debug() now uses IPython debugger by default

  • Improvements to Task.build() public API

  • Moves hook triggering logic to Task to simplify executors implementation

  • Adds DAGBuildEarlyStop exception to signal DAG execution stop

  • New option in Serial executor to turn warnings and exceptions capture off

  • Adds Product.prepare_metadata hook

  • Implements hot reload for notebooks and python callables

  • General clean ups for old __str__ and __repr__ in several modules

  • Refactored ploomber.sources module and ploomber.placeholders (previously ploomber.templates)

  • Adds NotebookRunner.debug() and NotebookRunner.develop()

  • NotebookRunner: now has an option to run static analysis on render

  • Adds documentation for DAG-level hooks

  • Bug fixes

0.3.5 (2020-05-03)

  • Bug fixes #88, #89, #90, #84, #91

  • Modifies Env API: Env() is now Env.load(), Env.start() is now Env()

  • New advanced Env guide added to docs

  • Env can now be used with a context manager

  • Improved DAGConfigurator API

  • Deletes logger configuration in executors constructors, logging is available via DAGConfigurator

0.3.4 (2020-04-25)

  • Dependencies cleanup

  • Removed (numpydoc) as dependency, now optional

  • A few bug fixes: #79, #71

  • All warnings are captured and shown at the end (Serial executor)

  • Moves differ parameter from DAG constructor to DAGConfigurator

0.3.3 (2020-04-23)

  • Cleaned up some modules, deprecated some rarely used functionality

  • Improves documentation aimed to developers looking to extend ploomber

  • Introduces DAGConfigurator for advanced DAG configuration [Experimental API]

  • Adds task to upload files to S3 (ploomber.tasks.UploadToS3), requires boto3

  • Adds DAG-level on_finish and on_failure hooks

  • Support for enabling logging in entry points (via –logging)

  • Support for starting an interactive session using entry points (via python -i -m)

  • Improved support for database drivers that can only send one query at a time

  • Improved repr for SQLAlchemyClient, shows URI (but hides password)

  • PythonCallable now validates signature against params at render time

  • Bug fixes

0.3.2 (2020-04-07)

  • Faster Product status checking, now performed at rendering time

  • New products: GenericProduct and GenericSQLRelation for Products that do not have a specific implementation (e.g. you can use Hive with the DBAPI client + GenericSQLRelation)

  • Improved DAG build reports, subselect columns, transform to pandas.DataFrame and dict

  • Parallel executor now returns build reports, just like the Serial executor

0.3.1 (2020-04-01)

  • DAG parallel executor

  • Interact with pipelines from the command line (entry module)

  • Bug fixes

  • Refactored access to Product.metadata

0.3 (2020-03-20)

  • New Quickstart and User Guide section in documentation

  • DAG rendering and build now continue until no more tasks can render/build (instead of failing at the first exception)

  • New @with_env and @load_env decorators for managing environments

  • Env expansion ({{user}} expands to the current, also {{git}} and {{version}} available)

  • Task.name is now optional when Task is initialized with a source that has __name__ attribute (Python functions) or a name attribute (like Placeholders returned from SourceLoader)

  • New Task.on_render hook

  • Bug fixes

  • A lot of new tests

  • Now compatible with Python 3.5 and higher

0.2.1 (2020-02-20)

  • Adds integration with pdb via PythonCallable.debug

  • Env.start now accepts a filename to look for

  • Improvements to data_frame_validator

0.2 (2020-02-13)

  • Simplifies installation

  • Deletes BashCommand, use ShellScript

  • More examples added

  • Refactored env module

  • Renames SQLStore to SourceLoader

  • Improvements to SQLStore

  • Improved documentation

  • Renamed PostgresCopy to PostgresCopyFrom

  • SQLUpload and PostgresCopy have now the same API

  • A few fixes to PostgresCopy (#1, #2)

0.1

  • First release

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ploomber-0.6.2.tar.gz (119.1 kB view hashes)

Uploaded Source

Built Distribution

ploomber-0.6.2-py3-none-any.whl (160.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page