Skip to main content

Hamilton, the micro-framework for creating dataframes.

Project description

Hamilton — portable & expressive
data transformation DAGs

Documentation Status Python supported PyPi Version Total Downloads Total Monthly Downloads
Hamilton Slack


Hamilton is a lightweight Python library for directed acyclic graphs (DAGs) of data transformations. Your DAG is portable; it runs anywhere Python runs, whether it's a script, notebook, Airflow pipeline, FastAPI server, etc. Your DAG is expressive; Hamilton has extensive features to define and modify the execution of a DAG (e.g., data validation, experiment tracking, remote execution).

To create a DAG, write regular Python functions that specify their dependencies with their parameters. As shown below, it results in readable code that can always be visualized. Hamilton loads that definition and automatically builds the DAG for you!

Create a project
Functions B() and C() refer to function A via their parameters

Hamilton brings modularity and structure to any Python application moving data: ETL pipelines, ML workflows, LLM applications, RAG systems, BI dashboards, and the Hamilton UI allows you to automatically visualize, catalog, and monitor execution.

Hamilton is great for DAGs, but if you need loops or conditional logic to create an LLM agent or a simulation, take a look at our sister library Burr 🤖 .

Installation

Hamilton supports Python 3.8+. We include the optional visualization dependency to display our Hamilton DAG. For visualizations, Graphviz needs to be installed on your system separately.

pip install "sf-hamilton[visualization]"

To use the Hamilton UI, install the ui and sdk dependencies.

pip install "sf-hamilton[ui,sdk]"

To try Hamilton in the browser, visit www.tryhamilton.dev

Why use Hamilton?

Data teams write code to deliver business value, but few have the resources to standardize practices and provide quality assurance. Moving from proof-of-concept to production and cross-function collaboration (e.g., data science, engineering, ops) remain challenging for teams, big or small. Hamilton is designed to help throughout a project's lifecycle:

  • Separation of concerns. Hamilton separates the DAG "definition" and "execution" which lets data scientists focus on solving problems and engineers manage production pipelines.

  • Effective collaboration. The Hamilton UI provides a shared interface for teams to inspect results and debug failures throughout the development cycle.

  • Low-friction dev to prod. Use @config.when() to modify your DAG between execution environments instead of error-prone if/else feature flags. The notebook extension prevents the pain of migrating code from a notebook to a Python module.

  • Portable transformations. Your DAG is independent of infrastructure or orchestration, meaning you can develop and debug locally and reuse code across contexts (local, Airflow, FastAPI, etc.).

  • Maintainable DAG definition. Hamilton automatically builds the DAG from a single line of code whether it has 10 or 1000 nodes. It can also assemble multiple Python modules into a pipeline, encouraging modularity.

  • Expressive DAGs. Function modifiers are a unique feature to keep your code DRY and reduce the complexity of maintaining large DAGs. Other frameworks inevitably lead to code redundancy or bloated functions.

  • Built-in coding style. The Hamilton DAG is defined using Python functions, encouraging modular, easy-to-read, self-documenting, and unit testable code.

  • Data and schema validation. Decorate functions with @check_output to validate output properties, and raise warnings or exceptions. Add the SchemaValidator() adapter to automatically inspect dataframe-like objects (pandas, polars, Ibis, etc.) to track and validate their schema.

  • Built for plugins. Hamilton is designed to play nice with all tools and provides the right abstractions to create custom integrations with your stack. Our lively community will help you build what you need!

Hamilton UI

You can track the execution of your Hamilton DAG in the Hamilton UI. It automatically populates a data catalog with lineage / tracing and provides execution observability to inspect results and debug errors. You can run it as a local server or a self-hosted application using Docker.

Description1 Description2 Description3

DAG catalog, automatic dataset profiling, and execution tracking

Get started with the Hamilton UI

  1. To use the Hamilton UI, install the dependencies (see Installation section) and start the server with

    hamilton ui
    
  2. On the first connection, create a username and a new project (the project_id should be 1).

Create a project

  1. Track your Hamilton DAG by creating a HamiltonTracker object with your username and project_id and adding it to your Builder. Now, your DAG will appear in the UI's catalog and all executions will be tracked!

    from hamilton import driver
    from hamilton_sdk.adapters import HamiltonTracker
    import my_dag
    
    # use your `username` and `project_id`
    tracker = HamiltonTracker(
       username="my_username",
       project_id=1,
       dag_name="hello_world",
    )
    
    # adding the tracker to the `Builder` will add the DAG to the catalog
    dr = (
       driver.Builder()
       .with_modules(my_dag)
       .with_adapters(tracker)  # add your tracker here
       .build()
    )
    
    # executing the `Driver` will track results
    dr.execute(["C"])
    

Documentation & learning resources

How does Hamilton compare to X?

Hamilton is not an orchestrator (you might not need one), nor a feature store (but you can use it to build one!). Its purpose is to help you structure and manage data transformations. If you know dbt, Hamilton does for Python what dbt does for SQL.

Another way to frame it is to think about the different layers of a data stack. Hamilton is at the asset layer. It helps you organize data transformations code (the expression layer), manage changes, and validate & test data.

Layer Purpose Example Tools
Orchestration Operational system for the creation of assets Airflow, Metaflow, Prefect, Dagster
Asset Organize expressions into meaningful units
(e.g., dataset, ML model, table)
Hamilton, dbt, dlt, SQLMesh, Burr
Expression Language to write data transformations pandas, SQL, polars, Ibis, LangChain
Execution Perform data transformations Spark, Snowflake, DuckDB, RAPIDS
Data Physical representation of data, inputs and outputs S3, Postgres, file system, Snowflake

See our page on Why use Hamilton? and framework code comparisons for more information.

📑 License

Hamilton is released under the BSD 3-Clause Clear License. See LICENSE for details.

🌎 Community

👨‍💻 Contributing

We're very supportive of changes by new contributors, big or small! Make sure to discuss potential changes by creating an issue or commenting on an existing one before opening a pull request. Good first contributions include creating an example or an integration with your favorite Python library!

To contribute, checkout our contributing guidelines, our developer setup guide, and our Code of Conduct.

😎 Used by

Hamilton was started at Stitch Fix before the original creators founded DAGWorks Inc! The library is battle-tested and has been supporting production use cases since 2019.

Read more about the origin story.

🤝 Code Contributors

Contributors

🙌 Special Mentions & 🦟 Bug Hunters

Thanks to our awesome community and their active involvement in the Hamilton library.

Nils Olsson, Michał Siedlaczek, Alaa Abedrabbo, Shreya Datar, Baldo Faieta, Anwar Brini, Gourav Kumar, Amos Aikman, Ankush Kundaliya, David Weselowski, Peter Robinson, Seth Stokes, Louis Maddox, Stephen Bias, Anup Joseph, Jan Hurst, Flavia Santos, Nicolas Huray, Manabu Niseki, Kyle Pounder, Alex Bustos, Andy Day, Alexander Cai, Nils Müller-Wendt, Paul Larsen, Kemal Eren, Jernej Frank, Noah Ridge

🎓 Citations

We'd appreciate citing Hamilton by referencing one of the following:

@inproceedings{DBLP:conf/vldb/KrawczykI22,
  title     = {Hamilton: a modular open source declarative paradigm for high level
               modeling of dataflows},
  author    = {Stefan Krawczyk and Elijah ben Izzy},
  editor    = {Satyanarayana R. Valluri and Mohamed Za{\"{\i}}t},
  booktitle = {1st International Workshop on Composable Data Management Systems,
               CDMS@VLDB 2022, Sydney, Australia, September 9, 2022},
  year      = {2022},
  url       = {https://cdmsworkshop.github.io/2022/Proceedings/ShortPapers/Paper6\_StefanKrawczyk.pdf},
  timestamp = {Wed, 19 Oct 2022 16:20:48 +0200},
  biburl    = {https://dblp.org/rec/conf/vldb/KrawczykI22.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{CEURWS:conf/vldb/KrawczykIQ22,
  title     = {Hamilton: enabling software engineering best practices for data transformations via generalized dataflow graphs},
  author    = {Stefan Krawczyk and Elijah ben Izzy and Danielle Quinn},
  editor    = {Cinzia Cappiello and Sandra Geisler and Maria-Esther Vidal},
  booktitle = {1st International Workshop on Data Ecosystems co-located with 48th International Conference on Very Large Databases (VLDB 2022)},
  pages     = {41--50},
  url       = {https://ceur-ws.org/Vol-3306/paper5.pdf},
  year      = {2022}
}

📚 Libraries built on / for Hamilton

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sf_hamilton-1.83.1.tar.gz (500.6 kB view details)

Uploaded Source

Built Distribution

sf_hamilton-1.83.1-py3-none-any.whl (388.8 kB view details)

Uploaded Python 3

File details

Details for the file sf_hamilton-1.83.1.tar.gz.

File metadata

  • Download URL: sf_hamilton-1.83.1.tar.gz
  • Upload date:
  • Size: 500.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for sf_hamilton-1.83.1.tar.gz
Algorithm Hash digest
SHA256 c061d3ddab26fd971d90a184f4c4c69b62bb4387dfc0e533f579bd47b2abc91c
MD5 d5c06867dcbf44db2bb7ebb90d29ef7a
BLAKE2b-256 079d3b04ba541b1f7f1b28be77292053424c37db316ce6898b3326cf76b3b68d

See more details on using hashes here.

File details

Details for the file sf_hamilton-1.83.1-py3-none-any.whl.

File metadata

  • Download URL: sf_hamilton-1.83.1-py3-none-any.whl
  • Upload date:
  • Size: 388.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for sf_hamilton-1.83.1-py3-none-any.whl
Algorithm Hash digest
SHA256 905c0ae6f84b0d6eb063c51b19317e0fb57e38681df2e2d7ecbf6fe12ef9d6cc
MD5 bb2bb38021c07c814b8418ec9af557e7
BLAKE2b-256 16386afa00b41a613c1748d991f6a46c1dde3384a78beea6a84fadb80ff352df

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page