dftly (pronounced deftly) is a simple library for a safe, expressive, config-file friendly, and readable DSL for encoding simple dataframe operations.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mmd_pypi

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

DataFrame Transformation Language from YAML (dftly)

Dftly (pronounced "deftly") is a simple, expressive, human-readable DSL for encoding simple tabular transformations over dataframes, designed for expression in YAML files. With dftly, you can transform your data, deftly!

Installation

pip install dftly

You can also install it locally via uv via:

uv sync

from the root of the repository.

Usage

Dftly is designed to make it easy to specify simple dataframe transformations in a YAML file (or a mapping-like format). In particular, with dftly, you can specify a mapping of output column names to expressions over input columns, then easily execute that over an input table.

Suppose we have an input dataframe that looks like this:

>>> import polars as pl
>>> from datetime import date
>>> df = pl.DataFrame({
...     "col1": [1, 2],
...     "col2": [3, 4],
...     "foo": ["5", "6"],
...     "col3": ["2020-01-01", "2021-06-15"],
...     "bp": ["120/80", "NULL"],
... })
>>> df
shape: (2, 5)
┌──────┬──────┬─────┬────────────┬────────┐
│ col1 ┆ col2 ┆ foo ┆ col3       ┆ bp     │
│ ---  ┆ ---  ┆ --- ┆ ---        ┆ ---    │
│ i64  ┆ i64  ┆ str ┆ str        ┆ str    │
╞══════╪══════╪═════╪════════════╪════════╡
│ 1    ┆ 3    ┆ 5   ┆ 2020-01-01 ┆ 120/80 │
│ 2    ┆ 4    ┆ 6   ┆ 2021-06-15 ┆ NULL   │
└──────┴──────┴─────┴────────────┴────────┘

with dftly, we can write a yaml file like this:

>>> ops = r"""
... sum: "$col1 + $col2"
... diff: "$foo::int - $col1"
... compare: "$col1 > ($col2 - 3) * 3"
... str_interp: 'f"value: {$foo} {$col1}"'
... max: "max($col1, $col2)"
... conditional: '"big" if $col1 > 1 else "small"'
... sys_bp: "extract group 1 of /(\\d+)\\/(\\d+)/ from $bp if /(\\d+)\\/(\\d+)/ in $bp"
... dia_bp: "(extract group 2 of /(\\d+)\\/(\\d+)/ from $bp if /(\\d+)\\/(\\d+)/ in $bp) as float"
... """

Then use it to transform the dataframe like so:

>>> from dftly import Parser
>>> df.select(**Parser.to_polars(ops))
shape: (2, 8)
┌─────┬──────┬─────────┬────────────┬─────┬─────────────┬────────┬────────┐
│ sum ┆ diff ┆ compare ┆ str_interp ┆ max ┆ conditional ┆ sys_bp ┆ dia_bp │
│ --- ┆ ---  ┆ ---     ┆ ---        ┆ --- ┆ ---         ┆ ---    ┆ ---    │
│ i64 ┆ i64  ┆ bool    ┆ str        ┆ i64 ┆ str         ┆ str    ┆ f32    │
╞═════╪══════╪═════════╪════════════╪═════╪═════════════╪════════╪════════╡
│ 4   ┆ 4    ┆ true    ┆ value: 5 1 ┆ 3   ┆ small       ┆ 120    ┆ 80.0   │
│ 6   ┆ 4    ┆ false   ┆ value: 6 2 ┆ 4   ┆ big         ┆ null   ┆ null   │
└─────┴──────┴─────────┴────────────┴─────┴─────────────┴────────┴────────┘

Other supported operations include string to time parsing, conversion to duration, datetime arithmetic, and more:

>>> ops = r"""
... as_date: '$col3::"%Y-%m-%d"'
... days_later: '($col3 as "%Y-%m-%d") + $col1::days'
... at_time: '$col3::"%Y-%m-%d" @ 11:30 a.m.'
... """
>>> df.select(**Parser.to_polars(ops))
shape: (2, 3)
┌────────────┬────────────┬─────────────────────┐
│ as_date    ┆ days_later ┆ at_time             │
│ ---        ┆ ---        ┆ ---                 │
│ date       ┆ date       ┆ datetime[μs]        │
╞════════════╪════════════╪═════════════════════╡
│ 2020-01-01 ┆ 2020-01-02 ┆ 2020-01-01 11:30:00 │
│ 2021-06-15 ┆ 2021-06-17 ┆ 2021-06-15 11:30:00 │
└────────────┴────────────┴─────────────────────┘

You can also add literal columns:

>>> ops = r"""
... str: '"hello"'
... int: '42'
... float: '3.14'
... bool: 'true'
... time: '11:30 a.m.'
... date: '2024-01-01'
... datetime: '2024-01-01 11:30 a.m.'
... """
>>> df.select(**Parser.to_polars(ops))
shape: (1, 7)
┌───────┬─────┬───────┬──────┬──────────┬────────────┬─────────────────────┐
│ str   ┆ int ┆ float ┆ bool ┆ time     ┆ date       ┆ datetime            │
│ ---   ┆ --- ┆ ---   ┆ ---  ┆ ---      ┆ ---        ┆ ---                 │
│ str   ┆ i32 ┆ f64   ┆ bool ┆ time     ┆ date       ┆ datetime[μs]        │
╞═══════╪═════╪═══════╪══════╪══════════╪════════════╪═════════════════════╡
│ hello ┆ 42  ┆ 3.14  ┆ true ┆ 11:30:00 ┆ 2024-01-01 ┆ 2024-01-01 11:30:00 │
└───────┴─────┴───────┴──────┴──────────┴────────────┴─────────────────────┘

Bare words as string literals

When dftly expressions are embedded in YAML config files, string literals normally require awkward double-quoting because YAML strips its own quotes before dftly sees the value. To avoid this, dftly treats bare words — identifiers without a $ prefix, quotes, or parentheses — as string literals when they appear as a standalone expression:

>>> ops = r"""
... code: MEDS_BIRTH
... col_ref: $col1 + $col2
... quoted_str: '"hello"'
... number: 42
... bool_val: true
... """
>>> df.select(**Parser.to_polars(ops))
shape: (2, 5)
┌────────────┬─────────┬────────────┬────────┬──────────┐
│ code       ┆ col_ref ┆ quoted_str ┆ number ┆ bool_val │
│ ---        ┆ ---     ┆ ---        ┆ ---    ┆ ---      │
│ str        ┆ i64     ┆ str        ┆ i32    ┆ bool     │
╞════════════╪═════════╪════════════╪════════╪══════════╡
│ MEDS_BIRTH ┆ 4       ┆ hello      ┆ 42     ┆ true     │
│ MEDS_BIRTH ┆ 6       ┆ hello      ┆ 42     ┆ true     │
└────────────┴─────────┴────────────┴────────┴──────────┘

Only bare words are affected — column references ($col1 + $col2), quoted strings ("hello"), numbers, booleans, and all other expression types work without dftly-level quoting. Note that number: 42 and bool_val: true are parsed by YAML itself as int/bool and passed directly to dftly as POD literals — they never go through the expression grammar. This is unambiguous because column references always require the $ prefix, so a bare word cannot be confused with a column, function call, or any other expression.

Warning: If a bare word appears as part of a larger expression (e.g., $col + TYPO), dftly will still interpret it as a string literal but will emit a warning, since this usually indicates a missing $ prefix rather than an intentional literal:

>>> import warnings
>>> with warnings.catch_warnings(record=True) as w:
...     warnings.simplefilter("always")
...     expr = Parser.expr_to_polars("$col1 + TYPO")
...     assert len(w) == 1
...     print(w[0].message)
Bare word 'TYPO' interpreted as string literal in a subexpression. Did you mean the column '$TYPO'? Use $TYPO for a column reference or "TYPO" for an explicit string literal.

Detailed Documentation

Internally, this simply parses the yaml file into a mapping, then treats the mapping as a map from desired output column name to input column expression, parsing each expression via the dftly grammar. In particular, the below is equivalent to the above:

>>> ops = {
...     "sum": "$col1 + $col2",
...     "diff": "$col2 - $col1",
...     "compare": "$col1 > ($col2 - 3) * 3",
...     "str_interp": 'f"value: {$foo} {$col1}"',
...     "max": "max($col1, $col2)",
...     "conditional": '"big" if $col1 > 1 else "small"',
...     "sys_bp": r"extract group 1 of /(\d+)\/(\d+)/ from $bp if /(\d+)\/(\d+)/ in $bp",
...     "dia_bp": r"extract group 2 of /(\d+)\/(\d+)/ from $bp if /(\d+)\/(\d+)/ in $bp",
... }
>>> from dftly import Parser
>>> parser = Parser()
>>> ops = {k: parser(v).polars_expr for k, v in ops.items()}
>>> df.select(**ops)
shape: (2, 8)
┌─────┬──────┬─────────┬────────────┬─────┬─────────────┬────────┬────────┐
│ sum ┆ diff ┆ compare ┆ str_interp ┆ max ┆ conditional ┆ sys_bp ┆ dia_bp │
│ --- ┆ ---  ┆ ---     ┆ ---        ┆ --- ┆ ---         ┆ ---    ┆ ---    │
│ i64 ┆ i64  ┆ bool    ┆ str        ┆ i64 ┆ str         ┆ str    ┆ str    │
╞═════╪══════╪═════════╪════════════╪═════╪═════════════╪════════╪════════╡
│ 4   ┆ 2    ┆ true    ┆ value: 5 1 ┆ 3   ┆ small       ┆ 120    ┆ 80     │
│ 6   ┆ 2    ┆ false   ┆ value: 6 2 ┆ 4   ┆ big         ┆ null   ┆ null   │
└─────┴──────┴─────────┴────────────┴─────┴─────────────┴────────┴────────┘

The way dftly works is that strings are parsed into dictionary forms representing the specified operations, and an AST over those nodes is built up once they are resolved into dictionary form. This means you can also specify the operations in a fully explicit manner using these dictionary views for a more expansive, but precise syntax:

>>> ops = r"""
... sum: # "$col1 + $col2"
...   add:
...     - column: col1
...     - column: col2
... diff: # "$col2 - $col1"
...   subtract:
...     - column: col2
...     - column: col1
... compare: # "$col1 > ($col2 - 3) * 3"
...   greater_than:
...     - column: col1
...     - multiply:
...         - subtract:
...             - column: col2
...             - literal: 3
...         - literal: 3
... str_interp: # 'f"value: {$foo} {$col1}"'
...   string_interpolate:
...     - literal: "value: {} {}"
...     - column: foo
...     - column: col1
... max: # "max($col1, $col2)"
...   max:
...     - column: col1
...     - column: col2
... conditional: # '"big" if $col1 > 1 else "small"'
...   conditional:
...     when:
...       greater_than:
...         - column: col1
...         - literal: 1
...     then:
...       literal: "big"
...     otherwise:
...       literal: "small"
... sys_bp: # "extract group 1 of /(\\d+)\\/(\\d+)/ from $bp if /(\\d+)\\/(\\d+)/ in $bp"
...   conditional:
...     when:
...       regex_match:
...         pattern:
...           literal: (\d+)\/(\d+)
...         source:
...           column: bp
...     then:
...       regex_extract:
...         group_index:
...           literal: 1
...         pattern:
...           literal: (\d+)\/(\d+)
...         source:
...           column: bp
... dia_bp: # "extract group 2 of /(\\d+)\\/(\\d+)/ from $bp if /(\\d+)\\/(\\d+)/ in $bp"
...   conditional:
...     when:
...       regex_match:
...         pattern:
...           literal: (\d+)\/(\d+)
...         source:
...           column: bp
...     then:
...       regex_extract:
...         group_index:
...           literal: 2
...         pattern:
...           literal: (\d+)\/(\d+)
...         source:
...           column: bp
... """
>>> df.select(**Parser.to_polars(ops))
shape: (2, 8)
┌─────┬──────┬─────────┬────────────┬─────┬─────────────┬────────┬────────┐
│ sum ┆ diff ┆ compare ┆ str_interp ┆ max ┆ conditional ┆ sys_bp ┆ dia_bp │
│ --- ┆ ---  ┆ ---     ┆ ---        ┆ --- ┆ ---         ┆ ---    ┆ ---    │
│ i64 ┆ i64  ┆ bool    ┆ str        ┆ i64 ┆ str         ┆ str    ┆ str    │
╞═════╪══════╪═════════╪════════════╪═════╪═════════════╪════════╪════════╡
│ 4   ┆ 2    ┆ true    ┆ value: 5 1 ┆ 3   ┆ small       ┆ 120    ┆ 80     │
│ 6   ┆ 2    ┆ false   ┆ value: 6 2 ┆ 4   ┆ big         ┆ null   ┆ null   │
└─────┴──────┴─────────┴────────────┴─────┴─────────────┴────────┴────────┘

Note that literals are parsed by the string parser into either (a) a literal of the appropriate type (int, float, bool) or into literal nodes which have the syntax literal: [value]. In some cases, what looks like a string in the string syntax is actually parsed directly to a literal; for example, the syntax $col3::"%Y-%m-%d" @ 11:30 a.m. features a string literal for the format, but a time literal for the time. In this way, using the string syntax is often more concise, as you would need to explicitly construct or cast a string to a time were you to use the dictionary syntax. Note that these circumstances can be identified by the lack of quotes around the time literal in the string syntax; string literals will always be quoted, things without quotes will be interpreted as non-string literals.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mmd_pypi

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.2.0

Apr 14, 2026

0.1.5

Apr 14, 2026

0.1.4

Apr 12, 2026

0.1.3

Apr 11, 2026

This version

0.1.2

Apr 8, 2026

0.1.1

Apr 7, 2026

0.1.0

Apr 7, 2026

0.0.2

Oct 13, 2025

0.0.1

Sep 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dftly-0.1.2.tar.gz (56.5 kB view details)

Uploaded Apr 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dftly-0.1.2-py3-none-any.whl (32.1 kB view details)

Uploaded Apr 8, 2026 Python 3

File details

Details for the file dftly-0.1.2.tar.gz.

File metadata

Download URL: dftly-0.1.2.tar.gz
Upload date: Apr 8, 2026
Size: 56.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dftly-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`e1013d3482751333c8a85fe89777b867ab4ec9d75cc9fb80e311e764375464eb`
MD5	`1f96c57ea96f59c532dc6d361f1518bb`
BLAKE2b-256	`102f58c23363968aeb3841bc177e35c134efd3e892f03541bedb802242d492ac`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dftly-0.1.2.tar.gz:

Publisher: python-build.yaml on mmcdermott/dftly

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dftly-0.1.2.tar.gz
- Subject digest: e1013d3482751333c8a85fe89777b867ab4ec9d75cc9fb80e311e764375464eb
- Sigstore transparency entry: 1257703314
- Sigstore integration time: Apr 8, 2026
Source repository:
- Permalink: mmcdermott/dftly@5e9715688838b583affb4e6704827f4d0d7f50cf
- Branch / Tag: refs/tags/0.1.2
- Owner: https://github.com/mmcdermott
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-build.yaml@5e9715688838b583affb4e6704827f4d0d7f50cf
- Trigger Event: push

File details

Details for the file dftly-0.1.2-py3-none-any.whl.

File metadata

Download URL: dftly-0.1.2-py3-none-any.whl
Upload date: Apr 8, 2026
Size: 32.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dftly-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd175bd42fb02682cec519e1df29f91de3ac6ce680c3d0ae4133af5b23b4ac0a`
MD5	`23b276764e8125e5029bc68584d032a4`
BLAKE2b-256	`ede0438072fe5ff892e82e90a8b7e39faa4f0de29312b4914e12c68634277ba2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dftly-0.1.2-py3-none-any.whl:

Publisher: python-build.yaml on mmcdermott/dftly

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dftly-0.1.2-py3-none-any.whl
- Subject digest: fd175bd42fb02682cec519e1df29f91de3ac6ce680c3d0ae4133af5b23b4ac0a
- Sigstore transparency entry: 1257703414
- Sigstore integration time: Apr 8, 2026
Source repository:
- Permalink: mmcdermott/dftly@5e9715688838b583affb4e6704827f4d0d7f50cf
- Branch / Tag: refs/tags/0.1.2
- Owner: https://github.com/mmcdermott
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-build.yaml@5e9715688838b583affb4e6704827f4d0d7f50cf
- Trigger Event: push

dftly 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

DataFrame Transformation Language from YAML (dftly)

Installation

Usage

Bare words as string literals

Detailed Documentation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance