A Python library for developing great data pipelines

ploomber Documentation Status

ploomber is an expressive workflow management library that provides incremental builds, testing and debugging tools to accelerate DS/ML pipeline development.


If you want to try out everything ploomber has to offer:

pip install ploomber[all]

Note that installing everything will attemp to install pygraphviz, which depends on graphviz, you have to install that first:

# if you are using conda (recommended)
conda install graphviz
# if you are using homebew
brew install graphviz
# for other systems, see:

If you want to start with the minimal amount of dependencies:

pip install ploomber


from ploomber import DAG
from ploomber.products import File
from ploomber.tasks import PythonCallable, SQLDump
from ploomber.clients import SQLAlchemyClient

dag = DAG()

# the first task dumps data from the db to the local filesystem
task_dump = SQLDump('SELECT * FROM example',
                    File(tmp_dir / 'example.csv'),

def _add_one(upstream, product):
    """Add one to column a
    df = pd.read_csv(str(upstream['dump']))
    df['a'] = df['a'] + 1
    df.to_csv(str(product), index=False)

def on_finish(task):
    df = pd.read_csv(str(task.product))
    assert not df['a'].isna().sum()

# we convert the Python function to a Task
task_add_one = PythonCallable(_add_one,
                              File(tmp_dir / 'add_one.csv'),
# verify there are no NAs in columns a
task_add_one.on_finish = on_finish

# declare how tasks relate to each other
task_dump >> task_add_one

# run the pipeline - incremental buids: ploomber will keep track of each
# task's source code and will only execute outdated tasks in the next run

# a DAG also serves as a tool to interact with your pipeline, for example,
# status will return a summary table

# start a debugging session (only works if task is a PythonCallable)


0.2.1 (2020-02-20)

  • Adds integration with pdb via PythonCallable.debug
  • Env.start now accepts a filename to look for
  • Improvements to data_frame_validator

0.2 (2020-02-13)

  • Simplifies installation
  • Deletes BashCommand, use ShellScript
  • More examples added
  • Refactored env module
  • Renames SQLStore to SourceLoader
  • Improvements to SQLStore
  • Improved documentation
  • Renamed PostgresCopy to PostgresCopyFrom
  • SQLUpload and PostgresCopy have now the same API
  • A few fixes to PostgresCopy (#1, #2)


  • First release

