Create nested dictionaries from columns of tabular data
Project description
col2dict
Create nested dictionaries from columns of tabular data.
Installation
pip install col2dict
Overview
Data is well collected in DataFrames, but its manipulation and analysis often requires the use of nested dictionaries. associate_columns allows you to create (possibly nested) dictionaries from columns of a DataFrame or a list of dicts, with control over how duplicate keys are merged at each nesting level.
Usage
from col2dict import associate_columns
Signatures
associate_columns(tab, (col1, col2))
associate_columns(tab, (col1, col2), merge=func)
associate_columns(tab, (col1, col2, col3, ...))
associate_columns(tab, (col1, col2, col3, ...), merge=func)
associate_columns(tab, (col1, col2, col3, ...), merge=[f12, f23, ...])
associate_columns(tab, ([colA, colB], col2)) # multi-column keys
associate_columns(tab, (col1, [colA, colB])) # multi-column values
associate_columns(tab, ([colA, colB], [colC, colD])) # both
Parameters
| Parameter | Description |
|---|---|
tab |
A pandas.DataFrame or a list of dicts. |
cols |
A tuple/list of column specs. Each element is a column name (str) or a list of column names. With 2 elements: first → keys, second → values. With 3+ elements: creates nested dicts. |
merge |
Merging function(s) for duplicate keys. None (default): auto-merge to list with a warning. A single callable: applied at every level. A list of callables: one per nesting transition. |
Options
| Option | Default | Description |
|---|---|---|
duplicates_warning |
True |
Warns when duplicate keys are found and no merge function is provided. Set to False to silence the warning and collect values into lists silently. |
Examples
Basic: two columns
import pandas as pd
from col2dict import associate_columns
df = pd.DataFrame({
"Name": ["Alice", "Bob", "Carol"],
"Age": [30, 25, 35],
})
associate_columns(df, ("Name", "Age"))
# {'Alice': 30, 'Bob': 25, 'Carol': 35}
Duplicate keys with merging
df = pd.DataFrame({
"Year": [2020, 2020, 2021, 2021, 2022],
"Name": ["Alice", "Bob", "Carol", "Dave", "Eve"],
"Score": [90, 85, 92, 88, 95],
})
associate_columns(df, ("Year", "Name"), merge=sorted)
# {2020: ['Alice', 'Bob'], 2021: ['Carol', 'Dave'], 2022: ['Eve']}
associate_columns(df, ("Year", "Score"), merge=sum)
# {2020: 175, 2021: 180, 2022: 95}
Nested association (3+ columns)
associate_columns(df, ("Year", "Name", "Score"))
# {2020: {'Alice': 90, 'Bob': 85},
# 2021: {'Carol': 92, 'Dave': 88},
# 2022: {'Eve': 95}}
Multi-column keys
df = pd.DataFrame({
"Store": ["A", "A", "B"],
"Dept": ["Elec", "Food", "Elec"],
"Revenue": [500, 100, 450],
})
associate_columns(df, (["Store", "Dept"], "Revenue"))
# {('A', 'Elec'): 500, ('A', 'Food'): 100, ('B', 'Elec'): 450}
Multi-column values
df = pd.DataFrame({
"Name": ["Alice", "Bob"],
"Age": [30, 25],
"City": ["NYC", "LA"],
})
associate_columns(df, ("Name", ["Age", "City"]))
# {'Alice': [30, 'NYC'], 'Bob': [25, 'LA']}
Per-level merge functions
df = pd.DataFrame({
"A": [1, 1, 1, 2],
"B": ["x", "x", "y", "z"],
"C": [100, 200, 300, 400],
})
# sorted at level 1 (sort inner dict keys), sum at level 2 (merge dup values)
associate_columns(df, ("A", "B", "C"), merge=[sorted, sum])
# {1: {'x': 300, 'y': 300}, 2: {'z': 400}}
Deep nesting with mixed multi-column specs
df = pd.DataFrame({
"Store": ["A", "A", "B", "B"],
"Dept": ["Elec", "Food", "Elec", "Elec"],
"Item": ["TV", "Milk", "TV", "TV"],
"Brand": ["Sony", "Org", "LG", "Sony"],
"Price": [500, 3, 450, 500],
})
# Store -> {Dept, Item} -> Brand -> Price
associate_columns(df, ("Store", ["Dept", "Item"], "Brand", "Price"))
# {'A': {('Elec', 'TV'): {'Sony': 500}, ('Food', 'Milk'): {'Org': 3}},
# 'B': {('Elec', 'TV'): {'LG': 450, 'Sony': 500}}}
For full documentation and more examples, see the original Wolfram Language resource page.
Testing
The test suite lives in tests/test_associate_columns.py and uses pytest. To run it:
pip install col2dict[dev]
pytest
Or from a source checkout:
git clone https://github.com/Daniele-Gregori/PyPI-packages.git
cd PyPI-packages/packages/col2dict
pip install -e ".[dev]"
pytest
The suite includes 68 tests organised in 14 groups covering basic associations, duplicate-key merging, nested dicts up to 5 levels deep, multi-column keys and values, per-level merge functions, input-type compatibility, edge cases, and error handling.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file col2dict-0.8.0.tar.gz.
File metadata
- Download URL: col2dict-0.8.0.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab13af8171514de240292d0602677656d484a6361e35763489845a7f56474833
|
|
| MD5 |
48205fd833c3f963322269c6bfd55dd6
|
|
| BLAKE2b-256 |
798e3dedbb2c525bedb39ff48e16433b553549d1f40359654014e7658fa659c5
|
File details
Details for the file col2dict-0.8.0-py3-none-any.whl.
File metadata
- Download URL: col2dict-0.8.0-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6b7d037e90ac737dd6e18e68b237ad040e3628eb3202b1778e840e987565c96
|
|
| MD5 |
73ee34bc8dddec08ccf5050b88e9891d
|
|
| BLAKE2b-256 |
319922b23da7a38eb6f8d0121ea3dccf9dfa78864b447057a4a019881cb0d719
|