Playbooks for data. Open, process and save table based data.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kellerza

These details have not been verified by PyPI

Project description

Data Playbook

:book: Playbooks for data. Open, process and save table based data.

Automate repetitive tasks on table based data. Include various input and output tasks.

Install: pip install dataplaybook

Use the @task and @playbook decorators

from dataplaybook import task, playbook
from dataplaybook.tasks.io_xlsx

@task
def print

Tasks

Tasks are implemented as simple Python functions and the modules can be found in the dataplaybook/tasks folder.

Description	Module	Functions
Generic function to work on tables	`dataplaybook.tasks`	build_lookup, build_lookup_var, combine, drop, extend, filter, print, replace, unique, vlookup
Fuzzy string matching	`dataplaybook.taksk.fuzzy`	Requires pip install fuzzywuzzy
Read/write excel files ()	`dataplaybook.tasks.io_xlsx`	read_excel, write_excel
Misc IO tasks	`dataplaybook.tasks.io_misc`	read_csv, read_tab_delim, read_text_regex, wget, write_csv
MongoDB functions	`dataplaybook.tasks.io_mongo`	read_mongo, write_mongo, columns_to_list, list_to_columns
PDF functions. Requires pdftotext on your path	`dataplaybook.tasks.io_pdf`	read_pdf_pages, read_pdf_files
Read XML	`dataplaybook.tasks.io_xml`	read_xml

$ dataplaybook --all -vvv
dataplaybook.tasks
- build_lookup "(*, table: list[RowData], key: str, columns: list[str]) -> Generator[RowData]"
- build_lookup_dict "(*, table: list[RowData], key: str | list[str], columns: list[str] | None = None) -> dict[str | tuple, Any]"
- combine "(*, tables: list[list[RowData]], key: str, columns: list[str], value: Union[Literal[True], str] = True) -> list[RowData]"
- ensure_lists "(*, tables: Sequence[list[RowData]], columns: Sequence[str]) -> None"
- filter_rows "(*, table: list[RowData], include: dict[str, str] | None = None, exclude: dict[str, str | list[str] | re.Pattern] | None
= None) -> Generator[RowData]"
- print_table "(*, table: list[RowData] | None = None, tables: dict[str, list[RowData]] | DataEnvironment | None = None) -> None"
- remove_null "(*, tables: Sequence[list[RowData]]) -> None"
- replace "(*, table: list[RowData], replace_dict: dict[str, str], columns: list[str]) -> None"
- unique "(*, table: list[RowData], key: str) -> Generator[RowData]"
- vlookup "(*, table0: list[RowData], acro: list[RowData], columns: list[str]) -> None"
dataplaybook.tasks.fuzzy
- fuzzy_match "(*, table1: list[RowData], table2: list[RowData], t1_column: str, t2_column: str, t1_target_column: str) -> None"
dataplaybook.tasks.ietf
- add_standards_column "(*, table: list[RowData], columns: list[str], rfc_col: str) -> None"
- extract_standards_from_table "(*, table: list[RowData], extract_columns: list[str], include_columns: list[str] | None = None, name: str = '', line_offset: int = 1) -> Generator[RowData]"
dataplaybook.tasks.gis
- linestring "(*, table: list[RowData], lat_a: str = 'latA', lat_b: str = 'latB', lon_a: str = 'lonA', lon_b: str = 'lonB', linestring_column: str = 'linestring', error: str = '22 -22') -> list[RowData]"
dataplaybook.tasks.io_mail
- mail "(*, to_addrs: list[str] | str, from_addr: str, subject: str, server: str, files: list[str] | None = None, priority: int = 4, body: str | None = '', html: str | None = '', cc_addrs: list[str] | None = None, bcc_addrs: list[str] | None = None) -> None"
dataplaybook.tasks.io_misc
- file_rotate "(*, file: os.PathLike | str, count: int = 3) -> None"
- glob "(*, patterns: list[str]) -> Generator[RowData]"
- read_csv "(*, file: os.PathLike | str, columns: dict[str, str] | None = None) -> Generator[RowData]"
- read_json "(*, file: os.PathLike | str) -> list[RowData]"
- read_tab_delim "(*, file: os.PathLike | str, headers: list[str]) -> Generator[RowData]"
- read_text_regex "(*, file: os.PathLike | str, newline: re.Pattern, fields: re.Pattern | None) -> Generator[RowData]"
- wget "(*, url: str, file: os.PathLike | str, age: int = 172800, headers: dict[str, str] | None = None) -> None"
- write_csv "(*, table: list[RowData], file: os.PathLike | str, header: list[str] | None = None) -> None"
- write_json "(*, data: dict[str, list[RowData]] | DataEnvironment | list[RowData], file: os.PathLike | str, only_var: bool = False) ->
None"
dataplaybook.tasks.io_mongo
- columns_to_list "(*, table: 'list[RowData]', list_column: 'str', columns: 'list[str]') -> 'None'"
- list_to_columns "(*, table: 'list[RowData]', list_column: 'str', columns: 'list[str]') -> 'None'"
- mongo_delete_sids "(*, mdb: 'MongoURI', sids: 'list[str]') -> 'None'"
- mongo_list_sids "(*, mdb: 'MongoURI') -> 'list[str]'"
- mongo_sync_sids "(*, mdb_local: 'MongoURI', mdb_remote: 'MongoURI', ignore_remote: 'abc.Sequence[str] | None' = None, only_sync_sids:
'abc.Sequence[str] | None' = None) -> 'None'"
- read_mongo "(*, mdb: 'MongoURI', set_id: 'str | None' = None) -> 'Generator[RowData]'"
- write_mongo "(*, table: 'list[RowData]', mdb: 'MongoURI', set_id: 'str | None' = None, force: 'bool' = False) -> 'None'"
dataplaybook.tasks.io_pdf
- read_pdf_files "(*, folder: str, pattern: str = '*.pdf', layout: bool = True, args: list[str] | None = None) -> Generator[RowData]"
- read_pdf_pages "(*, file: os.PathLike | str, layout: bool = True, args: list[str] | None = None) -> Generator[RowData]"
dataplaybook.tasks.io_xlsx
- read_excel "(*, tables: dict[str, list[RowData]] | DataEnvironment, file: os.PathLike | str, sheets: list[dataplaybook.tasks.io_xlsx.Sheet] | None = None) -> list[str]"
- write_excel "(*, tables: dict[str, list[RowData]] | DataEnvironment, file: os.PathLike | str, include: list[str] | None = None, sheets: list[dataplaybook.tasks.io_xlsx.Sheet] | None = None, ensure_string: bool = False) -> None"
dataplaybook.tasks.io_xml
- read_lxml "(*, tables: dict[str, list[RowData]] | DataEnvironment, file: str, targets: list[str]) -> None"
- read_xml "(*, tables: dict[str, list[RowData]] | DataEnvironment, file: str, targets: list[str]) -> None"

Local development

uv is used for dependency management. To install the dependencies.

uv sync --all-extras

pre-commit is used for code formatting and linting. Install pre-commit and run pre-commit install to install the git hooks.

uv tool install prek
prek install

Test locally using pre-commit (ruff, codespell, mypy)

git add . && prek

Data Playbook v0 - origins

Data playbooks was created to replace various snippets of code I had lying around. They were all created to ensure repeatability of some menial task, and generally followed a similar structure of load something, process it and save it. (Process network data into GIS tools, network audits & reporting on router & NMS output, Extract IETF standards to complete SOCs, read my bank statements into my Excel budgeting tool, etc.)

For many of these tasks I have specific processing code (tasks_x.py, loaded with modules: [tasks_x] in the playbook), but in almost all cases input & output tasks (and configuring these names etc) are common. The idea of the modular tasks originally came from Home Assistant, where I started learning Python and the idea of "custom components" to add your own integrations, although one could argue this also has similarities to Ansible playbooks.

In many cases I have a 'loose' coupling to actual file names, using Everything search (!es search_pattern in the playbook) to resolve a search pattern to the correct file used for input.

It has some parts in common with Ansible Playbooks, especially the name was chosen after I was introduced to Ansible Playbooks. The task structure has been updated in 2019 to match the Ansible Playbooks 2.0/2.5+ format and allow names. This format will also be easier to introduce loop mechanisms etc.

Comparison to Ansible Playbooks

Data playbooks is intended to create and modify variables in the environment (similar to inventory). Data playbooks starts with an empty environment (although you can read the environment from various sources inside the play). Although new variables can be created using register: in Ansible, data playbook functions requires the output to be captured through target:.

Data playbook tasks are different form Ansible's actions:

They are mostly not idempotent, since the intention is to modify tables as we go along,
they can return lists containing rows or be Python iterators (that yield rows of a table)
if they dont return any tabular data (a list), the return value will be added to the var table in the environment
Each have a strict voluptuous schema, evaluated when loading and during runtime (e.g. to expand templates) to allow quick troubleshooting

You could argue I can do this with Ansible, but it won't be as elegant with single item hosts files, gather_facts: no and delegate_to: localhost throughout the playbooks. It will likely only be half as much fun trying to force it into my way of thinking.

Release

Semantic versioning is used for release.

To create a new release, include a commit with a :dolphin: emoji as a prefix in the commit message. This will trigger a release on the master branch.

# Patch
git commit -m ":dolphin: Release 0.0.x"

# Minor
git commit -m ":rocket: Release 0.x.0"

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kellerza

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.2.14

May 12, 2026

1.2.13

Apr 14, 2026

1.2.12

Apr 14, 2026

1.2.11

Apr 14, 2026

1.2.10

Feb 20, 2026

1.2.9

Feb 20, 2026

1.2.8

Feb 20, 2026

1.2.7

Feb 19, 2026

1.2.6

Feb 19, 2026

1.2.5

Feb 19, 2026

1.2.4

Feb 19, 2026

1.2.3

Feb 19, 2026

1.2.2

Feb 18, 2026

1.2.1

Feb 17, 2026

1.2.0

Feb 17, 2026

1.1.16

Jan 30, 2026

1.1.15

Jan 30, 2026

1.1.14

Jan 29, 2026

1.1.13

Jan 22, 2026

1.1.12

Oct 14, 2025

1.1.11

Oct 13, 2025

1.1.10

Oct 13, 2025

1.1.9

Aug 28, 2025

1.1.8

Aug 21, 2025

1.1.7

Aug 21, 2025

1.1.6

Jul 25, 2025

1.1.5

Jul 14, 2025

1.1.4

May 22, 2025

1.1.3

Feb 26, 2025

1.1.2

Feb 26, 2025

1.1.1

Feb 26, 2025

1.1.0

Feb 25, 2025

1.0.21

Feb 10, 2025

1.0.20

Oct 2, 2024

1.0.19

Oct 1, 2024

1.0.18

Oct 1, 2024

1.0.16

Jun 1, 2023

1.0.15

May 6, 2023

1.0.14

May 3, 2023

1.0.13

Apr 18, 2023

1.0.12

Apr 17, 2023

1.0.11

Apr 12, 2023

1.0.10

May 17, 2022

1.0.9

Mar 4, 2022

1.0.8

Sep 30, 2021

1.0.7

Jun 30, 2021

1.0.6

Jun 8, 2021

1.0.5

Jun 8, 2021

1.0.4

Jun 2, 2021

1.0.3

Jun 1, 2021

1.0.2

Jun 1, 2021

0.6.8

Mar 31, 2020

0.6.7

Jan 14, 2020

0.6.5

Oct 30, 2019

0.6.2

May 29, 2019

0.6.1

May 9, 2019

0.6

Apr 24, 2019

0.3.4

Jan 24, 2019

0.3.3

Jan 14, 2019

0.3.2

Jan 11, 2019

0.3.0

Nov 6, 2018

0.2.4

Jul 12, 2018

0.2.3

Jul 11, 2018

0.2.2

Jul 6, 2018

0.2.1

Jul 6, 2018

0.2

Jul 5, 2018

0.1

Jun 14, 2018

0.0.0

Feb 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataplaybook-1.2.14.tar.gz (43.5 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dataplaybook-1.2.14-py3-none-any.whl (54.0 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file dataplaybook-1.2.14.tar.gz.

File metadata

Download URL: dataplaybook-1.2.14.tar.gz
Upload date: May 12, 2026
Size: 43.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dataplaybook-1.2.14.tar.gz
Algorithm	Hash digest
SHA256	`7f05ee4a13d7c4095ce6f8f1755058e1eb4c2725e070bca281a18a3a39ab117f`
MD5	`ee6726ce85dfce6b9a91365d98fbfdce`
BLAKE2b-256	`151b7fd98db627daaf0c460ea8514e429c1511e67d8bcbd38922f3867a3e322e`

See more details on using hashes here.

File details

Details for the file dataplaybook-1.2.14-py3-none-any.whl.

File metadata

Download URL: dataplaybook-1.2.14-py3-none-any.whl
Upload date: May 12, 2026
Size: 54.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dataplaybook-1.2.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`465a237bc1b1eabad099e2f25a78b9febcb12a8d6af40cc39a86e1b7380f6d29`
MD5	`d77b1ef4cb365e932b558950c9dee066`
BLAKE2b-256	`eeae854b42bcbea83249960617be6c2f160da94ad146117c81ff481419e55bf1`

See more details on using hashes here.

dataplaybook 1.2.14

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

Data Playbook

Tasks

Local development

Data Playbook v0 - origins

Comparison to Ansible Playbooks

Release

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes