Skip to main content

Python framework for transforming tabulated data with visual relationships into tidy data

Project description

Tidychef

Tests 100% Test Coverage Static Badge

🧠 A Different Way to Work with Tabular Data

Tidychef is a Python tool that helps you extract data from human-oriented spreadsheets—the kind published by governments, NGOs, and analysts.

Rather than relying on rigid cell references, Tidychef lets you define spatial relationships like “this value is below this header,” or “shift right from here,” making your scripts repeatable even when layouts evolve.

📊 Built for real-world publication tables: ONS, NHS, DfE, local authority reports, and more.

👥 Who Is Tidychef For?

👤 You are... 🧩 Your problem... ✅ Tidychef helps by...
A policy analyst A quarterly Excel export with merged headers and wide layout Extracting data using visual relationships, not cell indices
A finance/data consultant Repetitive report formatting with shifting structures Writing reusable “recipes” that adapt to visual changes
A data engineer Need to automate legacy spreadsheets Building robust, declarative extractors in Python

AI-powered overview: See how tidychef compares to other tools.

📊 Example

Consider this Excel-like structure — built for readers, not for code.

A simple script

from tidychef import acquire, filters, preview
from tidychef.direction import down, right, below
from tidychef.output import Column, TidyData

# Load a CSV table from a URL
table = acquire.csv.http(
    "https://raw.githubusercontent.com/mikeAdamss/tidychef/main/tests/fixtures/csv/bands-wide.csv"
)

# Select numeric observations and label them
observations = table.is_numeric().label_as("Value")

# Select headers and label them
bands = table.row_containing_strings(["Beatles"]).is_not_blank().label_as("Band")
assets = table.row_containing_strings(["Cars"]).is_not_blank().label_as("Asset")
names = table.cell_containing_string("Beatles").shift(down).expand_to_box().is_not_numeric().label_as("Name")

# We'll request a preview to see our selections
preview(observations, bands, assets, names)

# Build tidy data by attaching observations and headers
tidy_data = TidyData(
    observations,
    Column(bands.attach_closest(right)),
    Column(assets.attach_directly(below)),
    Column(names.attach_directly(right)),
)

# Export the tidy data to CSV
tidy_data.to_csv("bands_tidy.csv")

which will get you an inline preview (because we used preview() in the snippet)

preview

and will putput a csv (band_tidy.csv as per the snippet) that looks like this:

Note: image cropped for reasons of practicality.

💡 💡 KEY INSIGHT 💡💡

This is the bit you need to understand above all - here’s another preview I've made from running the exact same script against a radically altered version of the data source. This is what we mean by robust and repeatable transformations and why the focus of tidychef is modeling spatial relationships — how cells relate visually.

💡 Same script, radically different input—same output structure.

preview

📌 You’re modeling visual structure, not fixed coordinates!

🔍 Why Use Tidychef?

🧠 Visual logic — Work like a human, not like a parser.

🔁 Repeatable recipes — Robust to changes in layout, column order, or row spacing.

📦 Tidy output — Standard pandas.DataFrame or CSV.

🤝 Beginner-friendly — Analysts can learn fast with real-world examples.

🛠️ Advanced extensibility — Developers can subclass, extend, and customize as needed.

📘 Full Documentation

Extensive documentation, including tutorials, real UK government datasets, advanced recipes, and developer guidance is available at:

👉 mikeadamss.github.io/tidychef

Installation

pip install tidychef

Acknowledgements

Tidychef is directly inspired by the python package databaker created by The Sensible Code Company in partnership with the United Kingdoms Office For National Statistics.

While I liked databaker and successfully worked with it on multiple ETL projects over the course of almost a decade, I do consider this software the culmination of that work and the lessons learned from that time.

Get Involved

Please raise issues (or ideas as issues) freely on this repo.

If you'd like to get involved more directly then please see contributing guidance.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidychef-1.0.0.tar.gz (57.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tidychef-1.0.0-py3-none-any.whl (91.5 kB view details)

Uploaded Python 3

File details

Details for the file tidychef-1.0.0.tar.gz.

File metadata

  • Download URL: tidychef-1.0.0.tar.gz
  • Upload date:
  • Size: 57.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.11 Linux/6.11.0-29-generic

File hashes

Hashes for tidychef-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ee8e7669218577a4b09259e014f2cda1acd33b81d174906cb26a5c725e3c563a
MD5 ec25f9e4cafcfaa2a9a289258e237da4
BLAKE2b-256 df7f579d05c6ea54240504f935211e89730f437671f0090ced0b4e3b9c9bf9d8

See more details on using hashes here.

File details

Details for the file tidychef-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: tidychef-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 91.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.11 Linux/6.11.0-29-generic

File hashes

Hashes for tidychef-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e904a5fa114a5bacb6ea2e8d7c683f3f3a12031f3da1d677b6a970109dff41c
MD5 848eb9945c855a125cd815d1199a993a
BLAKE2b-256 f713121fd8e720e066ad9139b2bd6d899479893a0c21142ac2ba84aa115a328e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page