Fetch values from nested data structures
Project description
Datawalk
Fetch values from nested data structures with a friendly syntax based on math operators.
The features provided by this library are inspired by the pathlib.Path API proposing to use "/" operators to represent the folder structure to a file.
Datawalk proposes to use similar operators to access a value in a nested data structure.
A path to a value is called a walk.
Design choices and implementation rely on these documentation pages:
- operators and special methods: https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types
- operator precedence: https://docs.python.org/3/reference/expressions.html#operator-precedence
Benefits from using a walk to retrieve a value:
- walks provide an expressive way of navigating in data structures, hoping to improve legibility
- a walk is an indirection, providing a way to decouple the logic to retrieve a value from the logic to manipulate the value. It can be helpfull when you are refactoring the structure of your data: change the walk but not your business logic
- when writing a walk, the syntax is the same when using dict keys or object attributes or sequence indices
- walks are immutable and are agnostic of data sources, they can be applied to different datastructures. They can also be combined to produce deeper walks
- a walk provides expressive error messages telling where it failed to retrieve a value
Jump to the Use-cases section to see it in action.
Installation
Install datawalk with your favorite Python package manager:
# install from PyPI
pip install datawalk
uv add datawalk
poetry add datawalk
# etc.
# install from Github
pip install git+https://github.com/lucsorel/datawalk.git#egg=datawalk
uv add "datawalk @ git+https://github.com/lucsorel/datawalk"
poetry add git+https://github.com/lucsorel/datawalk.git
# etc.
Use cases
Let's create a nested data structure combining lists, dictionnaries, classes, dataclasses and namedtuples:
class Pet:
def __init__(self, name: str, type: str):
self.name = name
self.type = type
@dataclass
class PetDataclass:
name: str
type: str
class PetNamedTuple(NamedTuple):
name: str
type: str
data = {
'name': 'Lucie Nation',
'org': {
'title': 'Datawalk',
'address': {'country': 'France'},
'phones': ['01 23 45 67 89', '02 13 46 58 79'],
(666, 'ev/l'): 'hashable key',
},
'friends': [
{'name': 'Frankie Manning'},
{'name': 'Harry Cover'},
{'name': 'Suzie Q', 'phone': '06 43 15 27 98'},
{'name': 'Jean Blasin'},
],
'pets': [
Pet('Cinnamon', 'cat'),
PetDataclass('Caramel', 'dog'),
Pet('Melody', 'bird'),
PetNamedTuple('Socks', 'cat'),
],
}
Some use-cases:
from datawalk import Walk
name_walk = Walk / 'name'
name_walk.walk(data) # -> 'Lucie Nation'
# variations (the | pipe operator calls the .walk() method):
name_walk | data # -> 'Lucie Nation'
Walk / 'name' | data # -> 'Lucie Nation'
# use default value when failing to retrieve a value (with the ^ operator)
(Walk / 'lastname').walk(data, default=None) # -> None
Walk / 'lastname' ^ (data, None) # -> None
# organisation country
Walk / 'org' / 'address' / 'country' | data # -> 'France'
# combine walks
org_walk = Walk / 'org'
country_walk = Walk / 'address' / 'country'
org_walk + country_walk | data # -> 'France'
# get the 2nd org phone number
org_walk / 'phones' / 1 | data # -> '02 13 46 58 79'
# filter lists
# - by slicing
Walk / 'pets' / slice(::2) | data # -> [cinnamon, melody] Pet instances
# - by targeting the first instance matching a key:value requirement
Walk / 'pets' @ ('type', 'dog') / 'name' | data # -> 'Caramel'
# - by targeting all instances whose key matches a list of values
Walk / 'pets' % ('name', ['Melody', 'Socks']) | data # -> [melody, socks] instances
# use ellipsis to create a walk without the last selector
suzie_name_walk = Walk / 'friends' @ ('name', 'Suzie Q') / 'name'
suzie_phone_walk = suzie_name_walk / ... / 'phone'
suzie_name_walk | data # -> 'Suzie Q'
suzie_phone_walk | data # -> '06 43 15 27 98'
# pick key:value items into a new dict
short_address_walk = Walk / 'org' / 'address' // ('city', 'zipcode')
short_address_walk | data # -> {'city': 'Rennes', 'zipcode': '35700'}
# walk representations are concise and expressive
repr(suzie_phone_walk) # -> '.friends @(name==Suzie Q) .phone'
repr(org_walk / 'phones' / 1) # -> '.org .phones [1]'
repr(short_address_walk) # -> '.org .address {city,zipcode}'
repr(Walk / 'pets' % ('name', ['Melody', 'Socks'])) # -> ".pets %(name in ['Melody', 'Socks'])"
Datawalk helps you fix your walks with explicit error messages:
Walk / 'friends' @ ('name', 'Suzie Q') / 'phone_number' | data
# WalkError: datawalk.errors.WalkError: walked [.friends, @(name==Suzie Q)] but could not find .phone_number in {'name': 'Suzie Q', 'phone': '06 43 15 27 98'}
Walk / 'pets' @ ('name', 'Vanilla') / 'name' | data
# WalkError: walked [.pets] but could not find @(name==Vanilla) in (Pet(name=Cinnamon, type=cat), PetDataclass(name='Caramel', type='dog'), Pet(name=Melody, type=bird), PetNamedTuple(name='Socks', type='cat'))",
Walk / 'pets' % ('type', 'cat') # should have been % ('type', ['cat'])
# SelectorError: unsupported filter: ('type', 'cat'), value cat must be a sequence
Tests
# in a virtual environment
python3 -m pytest -v
# with uv
uv run pytest -v
Code coverage (with missed branch statements):
uv run pytest -v --cov=datawalk --cov-branch --cov-report term-missing --cov-fail-under 85
Changelog
See CHANGELOG.md.
Licence
Unless stated otherwise all works are licensed under the MIT license, a copy of which is included here.
Contributions
I'm thankful to all the people who have contributed to this project:
Pull requests
Pull-requests are welcome and will be processed on a best-effort basis.
Pull requests must follow the guidelines enforced by the pre-commit hooks:
- commit messages must follow the Angular conventions enforced by the
commitlinthook - code formatting must follow the conventions enforced by the
isortandruff-formathooks - code linting should not detect code smells in your contributions, this is checked by the
ruffhooks
Code conventions
The code conventions are described and enforced by pre-commit hooks to maintain consistency across the code base. The hooks are declared in the .pre-commit-config.yaml file.
Set the git hooks (pre-commit and commit-msg types):
uv run pre-commit install
Before committing, you can check your changes with:
# put all your changes in the git staging area
git add -A
# all hooks
uv run pre-commit run --all-files
# a specific hook
uv run pre-commit run ruff-format --all-files
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datawalk-0.3.0.tar.gz.
File metadata
- Download URL: datawalk-0.3.0.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fabf08cc54ff05090ff94283610f3771d7623436cd822dd79c79fd6c5789203b
|
|
| MD5 |
e48847b3cec3458de4f6887811e1515e
|
|
| BLAKE2b-256 |
4bc33cc741b8d5037250fdda5b8f809ddce345a814d4f76972f3018cc8739d9e
|
Provenance
The following attestation bundles were made for datawalk-0.3.0.tar.gz:
Publisher:
release.yml on lucsorel/datawalk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datawalk-0.3.0.tar.gz -
Subject digest:
fabf08cc54ff05090ff94283610f3771d7623436cd822dd79c79fd6c5789203b - Sigstore transparency entry: 865585576
- Sigstore integration time:
-
Permalink:
lucsorel/datawalk@6bdafd43bcec984a534b78d4085ad5740c7d4df9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/lucsorel
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6bdafd43bcec984a534b78d4085ad5740c7d4df9 -
Trigger Event:
workflow_run
-
Statement type:
File details
Details for the file datawalk-0.3.0-py3-none-any.whl.
File metadata
- Download URL: datawalk-0.3.0-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be525ec80d88b74ae5ba4cedd7db5c08ad43194be923a199e801f27c9cb685ea
|
|
| MD5 |
fdb8adbfb85471715ae151bb0d582993
|
|
| BLAKE2b-256 |
e1fd11e369921d2ef506c04aa76d1e8ce4c6eae51597ab3127efa931f9586475
|
Provenance
The following attestation bundles were made for datawalk-0.3.0-py3-none-any.whl:
Publisher:
release.yml on lucsorel/datawalk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datawalk-0.3.0-py3-none-any.whl -
Subject digest:
be525ec80d88b74ae5ba4cedd7db5c08ad43194be923a199e801f27c9cb685ea - Sigstore transparency entry: 865585633
- Sigstore integration time:
-
Permalink:
lucsorel/datawalk@6bdafd43bcec984a534b78d4085ad5740c7d4df9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/lucsorel
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6bdafd43bcec984a534b78d4085ad5740c7d4df9 -
Trigger Event:
workflow_run
-
Statement type: