Data list management library
Project description
Dlist
A list of dictionaries with database-like operations. Think of it as a lightweight, in-memory table where each row is a Python dictionary.
Requires Python ≥ 3.10
Contents
- Quickstart
- Indexing
- Four core operations
- Arithmetic
- Sorting
- Input/Output
- Quick reference
- Project structure
- License
Quickstart
from dlist import Dlist
data = [
{'name': 'alice', 'age': '30', 'city': 'london'},
{'name': 'bob', 'age': '25', 'city': 'london'},
{'name': 'charlie', 'age': '28', 'city': 'paris'},
]
d = Dlist(data, id_='name')
d.filter(city='london') # select rows → new Dlist (2 items)
d.assign(status='active') # set fields → new Dlist (3 items, all with status)
d.partition('city') # group by key → {'london': Dlist, 'paris': Dlist}
d.tree() # explore structure in terminal
Installation
pip install dlist
Or for development:
pip install dlist[dev]
Core concepts
Data model
A Dlist is a list of dictionaries. Scalar values are stored as strings; list values are preserved as-is. An optional id_ field acts as a unique primary key.
d = Dlist([
{'id': '1', 'product': 'laptop', 'price': '1200'},
{'id': '2', 'product': 'mouse', 'price': '25'},
{'id': '3', 'product': 'keyboard', 'price': '80'},
], id_='id')
Nested dictionaries
Values can be nested dicts. Access nested keys using __ (double underscore), similar to Django's ORM syntax:
data = [
{'name': 'john', 'address': {'city': 'NYC', 'zip': '10001'}},
{'name': 'jane', 'address': {'city': 'LA', 'zip': '90001'}},
]
d = Dlist(data, id_='name')
d.vals('address__city') # → ['LA', 'NYC']
d.filter(address__city='NYC') # → Dlist with john
Internally, the nested structure is always preserved. The __ syntax is just for addressing.
List values
Fields can contain lists. List values are preserved (not coerced to strings) and get special treatment in filter, vals, and partition:
data = [
{'id': '1', 'name': 'photo1', 'tags': ['photo', 'landscape']},
{'id': '2', 'name': 'video1', 'tags': ['video', 'interview']},
{'id': '3', 'name': 'mixed', 'tags': ['photo', 'video']},
{'id': '4', 'name': 'doc1', 'tags': 'report'},
]
d = Dlist(data, id_='id')
d.filter(tags='photo') # → items 1, 3 (membership test)
d.filter(tags='-photo') # → items 2, 4 (exclude)
d.filter(tags='*port*') # → item 4 (glob on each element)
d.vals('tags') # → ['interview', 'landscape', 'photo', 'report', 'video']
d.vals('tags', count=True) # → {'photo': 3, 'video': 2, ...}
When displayed in tables, lists are shown as comma-separated values: photo, landscape.
Exploring data
keys() — available keys
d.keys() # all keys (sorted)
d.keys('address') # subkeys under 'address'
d.keys(All=True) # only keys present in every record
d.keys(count=True) # {'name': 3, 'address': 3, ...}
vals() — values for a key
d.vals('name') # ['jane', 'john']
d.vals('address__city', count=True) # {'NYC': 1, 'LA': 1}
tree() — visual overview
d.tree()
Dlist (2 items, id='name')
├── name 2/2 ['john' (1), 'jane' (1)]
└── address
├── city 2/2 ['NYC' (1), 'LA' (1)]
└── zip 2/2 ['10001' (1), '90001' (1)]
Each leaf shows: key unique_values/records_with_key [top values (count), ...]
Use d.tree(root='address') to zoom into a subtree, or d.tree(top=5) to show more values.
Indexing
Access records by integer position, by id, or by list:
d[0] # → dict (first record)
d['john'] # → dict (by id)
d[[0, 1]] # → Dlist (subset)
d[['john', 'jane']] # → Dlist (subset by ids)
Assignment
Assignment merges the new dict into the existing record:
d['john'] = {'phone': '555-1234'}
# john now has name, address AND phone
d[['john', 'jane']] = {'verified': 'True'}
# both records get the new key
Setting or generating an id
Promote an existing field to id, or generate sequential ids:
d2 = d.set_id('name') # existing field → id
d2 = d.sort('name').set_id('id', generate='R') # sort first, then generate R01, R02, …
d2 = d.set_id('id', generate='E', digits=5) # E00001, E00002, …
Auto digits uses one more digit than needed (5 records → 2 digits, 150 → 4). Raises ValueError if the key already exists in records.
Four core operations
All operations are non-mutating — they return a new Dlist, leaving the original untouched.
filter(**query) — select rows
Returns a subset of records matching the query.
d.filter(city='london') # exact match
d.filter(name='a*') # glob pattern
d.filter(city='-paris') # exclude value
d.filter(phone=True) # key must exist
d.filter(phone=False) # key must NOT exist
d.filter(city='london', age='30') # AND (all must match)
d.filter(tags='photo') # list membership (if tags is a list)
Get the complement (matched + rest):
matched, rest = d.filter(complement=True, city='london')
map(f, inputL, outputL) — transform fields
Applies a function to each record, reading from inputL keys and writing to outputL keys.
# Single output — return a bare value
d2 = d.map(lambda name: name.upper(), ['name'], ['upper_name'])
# Multiple outputs — return a tuple or list
d2 = d.map(lambda name, age: (name.upper(), int(age) + 1),
['name', 'age'], ['upper', 'next_age'])
Apply only to matching records (others are kept unchanged):
d2 = d.map(lambda price: str(float(price) * 0.9),
['price'], ['price'],
query={'product': 'laptop'})
If f returns None, the record is kept unchanged. If an input key is missing, the record is skipped.
assign(query={}, **kwargs) — set constant values
Shorthand for map when you just want to set fixed values:
d2 = d.assign(status='active') # all records
d2 = d.assign(query={'city': 'london'}, status='uk') # only matching
partition(key) — group by key
Splits into a dict of Dlists, one per unique value:
groups = d.partition('city')
# {'london': Dlist(2 items), 'paris': Dlist(1 items)}
Records where the key is missing go to None.
Arithmetic
dl1 + dl2 # merge: combine records, dl1 values win on conflict
dl1 - dl2 # subtract: remove dl2 records from dl1
dl1 == dl2 # equality: same records and same id
With id, + merges dicts with matching ids and appends new ones. Without id, it compares by full dict equality.
Sorting
sort() is the only in-place operation. It returns self for chaining:
d.sort() # sort by id field
d.sort('name') # sort by any key
d.sort('price', reverse=True) # descending
d.sort('address__city') # nested key
d.sort(['type', 'name']) # multi-key: primary by type, secondary by name
Input/Output
Reading JSON files
d = Dlist.read('data.json', id_='id') # from file
d = Dlist.read('[{"id": "1", "name": "a"}]') # from JSON string
d = Dlist.read([{'id': '1', 'name': 'a'}]) # from list of dicts
Join multiple JSON files by a shared key (requires duckdb):
d = Dlist.read_pivot('evidence/*.json') # glob pattern
d = Dlist.read_pivot({'files.json': 'file',
'types.json': 'type'}, pivot='id') # explicit mapping
Reading Excel files
Parse .xlsx workbooks with automatic header detection and subcategory recognition:
d = Dlist.read('report.xlsx') # format auto-detected from .xlsx
d = Dlist.read('report.xlsx', header_row=3) # explicit header row
d = Dlist.read('report.xlsx', sheets=[1, 2]) # specific sheets
d = Dlist.read('report.xlsx', header_row={1: 3, 2: 1}) # per-sheet header
Each record gets a row field (original row number). Multi-sheet workbooks add a tab field. Subcategory rows (column A empty, column B has a label) inject a category field into subsequent records.
d = Dlist.read('report.xlsx', header_row=3, sheets=[1], id_='ID')
d.filter(category='Forensic*') # filter by subcategory
d.partition('category') # group by subcategory
The Excel formatter writes subcategory labels to column B, so write → read round-trips preserve categories.
See docs/parsers.md for full details.
Formatted output
Write tables in multiple formats — ASCII, Markdown, LaTeX, Excel, and JSON:
print(d.write(format='ascii', keys=['id', 'name']))
print(d.write(format='md', keys=['id', 'name']))
print(d.write(format='latex', keys=['id', 'name']))
d.write('output.xlsx', keys=['id', 'name']) # format auto-detected from .xlsx
d.write('data.json') # format auto-detected from .json
Categorized output groups records by one or more keys:
print(d.write(format='ascii', ctree=['type'], keys=['id', 'name'],
titles={'type': 'Type: {}', 'id': 'ID', 'name': 'Name'}))
All table formatters default to width='auto' (columns sized to content). Override with width=10 for fixed-width or maxwidth=20 to cap auto-width.
See docs/formatters.md for full details on each format, styling options, multi-column layout, pagination, and Excel subcategory output.
Quick reference
| Method | Description |
|---|---|
| Creation | |
Dlist(data, id_='key') |
Create from list of dicts |
Dlist.read('file.json') |
Read from JSON file, string, or list |
Dlist.read('file.xlsx') |
Read from Excel (format auto-detected) |
Dlist.read_pivot('*.json') |
Join multiple JSON files by shared key |
| Querying | |
d.filter(**query) |
Select rows matching query → new Dlist |
d.filter(complement=True, ...) |
Select + return non-matching rows |
d.vals('key') |
Unique values for a key |
d.vals('key', count=True) |
Value frequencies as dict |
d.keys() |
All available keys |
d.tree() |
Visual structure overview |
| Indexing | |
d[0], d['id_val'] |
Get record by position or id |
d[[0, 1]], d[['a', 'b']] |
Get subset → new Dlist |
d['id_val'] = {...} |
Merge dict into record |
| Transforming | |
d.map(f, inputs, outputs) |
Apply function to each record → new Dlist |
d.assign(**kwargs) |
Set constant values → new Dlist |
d.assign(query={...}, **kw) |
Set values on matching records only |
d.partition('key') |
Group by key → dict of Dlists |
d.set_id('key') |
Promote existing field to id |
d.set_id('id', generate='R') |
Generate sequential ids (R01, R02, …) |
d.sort('key') |
Sort in place (returns self) |
d.sort(['k1', 'k2']) |
Multi-key sort (primary, secondary, …) |
| Arithmetic | |
d1 + d2 |
Merge two Dlists |
d1 - d2 |
Subtract records |
d1 == d2 |
Equality check |
| Output | |
d.write() |
JSON string |
d.write('file.json') |
Save to JSON file |
d.write('file.xlsx', keys=[...]) |
Save to Excel (auto-detected) |
d.write(format='ascii', keys=[...]) |
ASCII table string |
d.write(format='md', keys=[...]) |
Markdown table string |
d.write(format='latex', keys=[...]) |
LaTeX table string |
d.write(..., ctree=['key']) |
Categorized output |
Project structure
dlist/
├── src/dlist/
│ ├── __init__.py # Package entry point
│ ├── dlist.py # Dlist class (core operations)
│ ├── helpers.py # flattenDict, structDict, merge_dicts, getDitem
│ ├── formatters/ # ASCII, Markdown, LaTeX, Excel, JSON output
│ └── parsers/ # JSON, Excel input, pivot joins
├── tests/
│ ├── test_core.py # core operations + list fields
│ ├── test_formatters.py # ASCII, Markdown, LaTeX, Excel
│ └── test_excel_parser.py # Excel parser
├── docs/
│ ├── formatters.md # formatter reference
│ └── parsers.md # parser reference (Excel details)
├── example_input.xlsx # sample Excel file for parser demo
└── pyproject.toml
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydlist-2.0.0.tar.gz.
File metadata
- Download URL: pydlist-2.0.0.tar.gz
- Upload date:
- Size: 53.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Pop!_OS","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0e7e5b4ac14e4b84e694acefc947941f7ff73511ba82a8d0857982857ea6321
|
|
| MD5 |
fe80f9310dae5ce64fc49be2d14fb9bf
|
|
| BLAKE2b-256 |
1e115fb8cf245bcac8458bbfc8e126cbd4bff6af2b8d0879e5f760d93aa7432a
|
File details
Details for the file pydlist-2.0.0-py3-none-any.whl.
File metadata
- Download URL: pydlist-2.0.0-py3-none-any.whl
- Upload date:
- Size: 45.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Pop!_OS","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4dc03975859b9832c2608329ed335d270116c2c779cba9224ee0179685ccba9d
|
|
| MD5 |
ed2df24890b89dccf311ed1c70ec1f23
|
|
| BLAKE2b-256 |
0a7bca2d1df80b7ff617af06be26d4ffdb85b664f9e44224ec5035d01568353f
|