Skip to main content

Data list management library

Project description

Dlist

A list of dictionaries with database-like operations. Think of it as a lightweight, in-memory table where each row is a Python dictionary.

Requires Python ≥ 3.10

Contents

Quickstart

from dlist import Dlist

data = [
    {'name': 'alice', 'age': '30', 'city': 'london'},
    {'name': 'bob',   'age': '25', 'city': 'london'},
    {'name': 'charlie', 'age': '28', 'city': 'paris'},
]

d = Dlist(data, id_='name')

d.filter(city='london')          # select rows  → new Dlist (2 items)
d.assign(status='active')        # set fields   → new Dlist (3 items, all with status)
d.partition('city')              # group by key → {'london': Dlist, 'paris': Dlist}
d.tree()                         # explore structure in terminal

Installation

pip install dlist

Or for development:

pip install dlist[dev]

Core concepts

Data model

A Dlist is a list of dictionaries. Scalar values are stored as strings; list values are preserved as-is. An optional id_ field acts as a unique primary key.

d = Dlist([
    {'id': '1', 'product': 'laptop',   'price': '1200'},
    {'id': '2', 'product': 'mouse',    'price': '25'},
    {'id': '3', 'product': 'keyboard', 'price': '80'},
], id_='id')

Nested dictionaries

Values can be nested dicts. Access nested keys using __ (double underscore), similar to Django's ORM syntax:

data = [
    {'name': 'john', 'address': {'city': 'NYC', 'zip': '10001'}},
    {'name': 'jane', 'address': {'city': 'LA',  'zip': '90001'}},
]
d = Dlist(data, id_='name')

d.vals('address__city')              # → ['LA', 'NYC']
d.filter(address__city='NYC')        # → Dlist with john

Internally, the nested structure is always preserved. The __ syntax is just for addressing.

List values

Fields can contain lists. List values are preserved (not coerced to strings) and get special treatment in filter, vals, and partition:

data = [
    {'id': '1', 'name': 'photo1', 'tags': ['photo', 'landscape']},
    {'id': '2', 'name': 'video1', 'tags': ['video', 'interview']},
    {'id': '3', 'name': 'mixed',  'tags': ['photo', 'video']},
    {'id': '4', 'name': 'doc1',   'tags': 'report'},
]
d = Dlist(data, id_='id')

d.filter(tags='photo')       # → items 1, 3 (membership test)
d.filter(tags='-photo')      # → items 2, 4 (exclude)
d.filter(tags='*port*')      # → item 4 (glob on each element)
d.vals('tags')               # → ['interview', 'landscape', 'photo', 'report', 'video']
d.vals('tags', count=True)   # → {'photo': 3, 'video': 2, ...}

When displayed in tables, lists are shown as comma-separated values: photo, landscape.


Exploring data

keys() — available keys

d.keys()                    # all keys (sorted)
d.keys('address')           # subkeys under 'address'
d.keys(All=True)            # only keys present in every record
d.keys(count=True)          # {'name': 3, 'address': 3, ...}

vals() — values for a key

d.vals('name')              # ['jane', 'john']
d.vals('address__city', count=True)   # {'NYC': 1, 'LA': 1}

tree() — visual overview

d.tree()
Dlist (2 items, id='name')
├── name    2/2 ['john' (1), 'jane' (1)]
└── address
    ├── city    2/2 ['NYC' (1), 'LA' (1)]
    └── zip     2/2 ['10001' (1), '90001' (1)]

Each leaf shows: key unique_values/records_with_key [top values (count), ...]

Use d.tree(root='address') to zoom into a subtree, or d.tree(top=5) to show more values.


Indexing

Access records by integer position, by id, or by list:

d[0]                # → dict (first record)
d['john']           # → dict (by id)
d[[0, 1]]           # → Dlist (subset)
d[['john', 'jane']] # → Dlist (subset by ids)

Assignment

Assignment merges the new dict into the existing record:

d['john'] = {'phone': '555-1234'}
# john now has name, address AND phone

d[['john', 'jane']] = {'verified': 'True'}
# both records get the new key

Setting or generating an id

Promote an existing field to id, or generate sequential ids:

d2 = d.set_id('name')                        # existing field → id
d2 = d.sort('name').set_id('id', generate='R')  # sort first, then generate R01, R02, …
d2 = d.set_id('id', generate='E', digits=5)  # E00001, E00002, …

Auto digits uses one more digit than needed (5 records → 2 digits, 150 → 4). Raises ValueError if the key already exists in records.


Four core operations

All operations are non-mutating — they return a new Dlist, leaving the original untouched.

filter(**query) — select rows

Returns a subset of records matching the query.

d.filter(city='london')              # exact match
d.filter(name='a*')                  # glob pattern
d.filter(city='-paris')              # exclude value
d.filter(phone=True)                 # key must exist
d.filter(phone=False)                # key must NOT exist
d.filter(city='london', age='30')    # AND (all must match)
d.filter(tags='photo')               # list membership (if tags is a list)

Get the complement (matched + rest):

matched, rest = d.filter(complement=True, city='london')

map(f, inputL, outputL) — transform fields

Applies a function to each record, reading from inputL keys and writing to outputL keys.

# Single output — return a bare value
d2 = d.map(lambda name: name.upper(), ['name'], ['upper_name'])

# Multiple outputs — return a tuple or list
d2 = d.map(lambda name, age: (name.upper(), int(age) + 1),
           ['name', 'age'], ['upper', 'next_age'])

Apply only to matching records (others are kept unchanged):

d2 = d.map(lambda price: str(float(price) * 0.9),
           ['price'], ['price'],
           query={'product': 'laptop'})

If f returns None, the record is kept unchanged. If an input key is missing, the record is skipped.

assign(query={}, **kwargs) — set constant values

Shorthand for map when you just want to set fixed values:

d2 = d.assign(status='active')                          # all records
d2 = d.assign(query={'city': 'london'}, status='uk')    # only matching

partition(key) — group by key

Splits into a dict of Dlists, one per unique value:

groups = d.partition('city')
# {'london': Dlist(2 items), 'paris': Dlist(1 items)}

Records where the key is missing go to None.


Arithmetic

dl1 + dl2    # merge: combine records, dl1 values win on conflict
dl1 - dl2    # subtract: remove dl2 records from dl1
dl1 == dl2   # equality: same records and same id

With id, + merges dicts with matching ids and appends new ones. Without id, it compares by full dict equality.


Sorting

sort() is the only in-place operation. It returns self for chaining:

d.sort()                          # sort by id field
d.sort('name')                    # sort by any key
d.sort('price', reverse=True)     # descending
d.sort('address__city')           # nested key
d.sort(['type', 'name'])          # multi-key: primary by type, secondary by name

Input/Output

Reading JSON files

d = Dlist.read('data.json', id_='id')          # from file
d = Dlist.read('[{"id": "1", "name": "a"}]')   # from JSON string
d = Dlist.read([{'id': '1', 'name': 'a'}])     # from list of dicts

Join multiple JSON files by a shared key (requires duckdb):

d = Dlist.read_pivot('evidence/*.json')                     # glob pattern
d = Dlist.read_pivot({'files.json': 'file',
                      'types.json': 'type'}, pivot='id')    # explicit mapping

Reading Excel files

Parse .xlsx workbooks with automatic header detection and subcategory recognition:

d = Dlist.read('report.xlsx')                          # format auto-detected from .xlsx
d = Dlist.read('report.xlsx', header_row=3)            # explicit header row
d = Dlist.read('report.xlsx', sheets=[1, 2])           # specific sheets
d = Dlist.read('report.xlsx', header_row={1: 3, 2: 1}) # per-sheet header

Each record gets a row field (original row number). Multi-sheet workbooks add a tab field. Subcategory rows (column A empty, column B has a label) inject a category field into subsequent records.

d = Dlist.read('report.xlsx', header_row=3, sheets=[1], id_='ID')
d.filter(category='Forensic*')    # filter by subcategory
d.partition('category')           # group by subcategory

The Excel formatter writes subcategory labels to column B, so writeread round-trips preserve categories.

See docs/parsers.md for full details.


Formatted output

Write tables in multiple formats — ASCII, Markdown, LaTeX, Excel, and JSON:

print(d.write(format='ascii', keys=['id', 'name']))
print(d.write(format='md', keys=['id', 'name']))
print(d.write(format='latex', keys=['id', 'name']))
d.write('output.xlsx', keys=['id', 'name'])   # format auto-detected from .xlsx
d.write('data.json')                           # format auto-detected from .json

Categorized output groups records by one or more keys:

print(d.write(format='ascii', ctree=['type'], keys=['id', 'name'],
              titles={'type': 'Type: {}', 'id': 'ID', 'name': 'Name'}))

All table formatters default to width='auto' (columns sized to content). Override with width=10 for fixed-width or maxwidth=20 to cap auto-width.

See docs/formatters.md for full details on each format, styling options, multi-column layout, pagination, and Excel subcategory output.


Quick reference

Method Description
Creation
Dlist(data, id_='key') Create from list of dicts
Dlist.read('file.json') Read from JSON file, string, or list
Dlist.read('file.xlsx') Read from Excel (format auto-detected)
Dlist.read_pivot('*.json') Join multiple JSON files by shared key
Querying
d.filter(**query) Select rows matching query → new Dlist
d.filter(complement=True, ...) Select + return non-matching rows
d.vals('key') Unique values for a key
d.vals('key', count=True) Value frequencies as dict
d.keys() All available keys
d.tree() Visual structure overview
Indexing
d[0], d['id_val'] Get record by position or id
d[[0, 1]], d[['a', 'b']] Get subset → new Dlist
d['id_val'] = {...} Merge dict into record
Transforming
d.map(f, inputs, outputs) Apply function to each record → new Dlist
d.assign(**kwargs) Set constant values → new Dlist
d.assign(query={...}, **kw) Set values on matching records only
d.partition('key') Group by key → dict of Dlists
d.set_id('key') Promote existing field to id
d.set_id('id', generate='R') Generate sequential ids (R01, R02, …)
d.sort('key') Sort in place (returns self)
d.sort(['k1', 'k2']) Multi-key sort (primary, secondary, …)
Arithmetic
d1 + d2 Merge two Dlists
d1 - d2 Subtract records
d1 == d2 Equality check
Output
d.write() JSON string
d.write('file.json') Save to JSON file
d.write('file.xlsx', keys=[...]) Save to Excel (auto-detected)
d.write(format='ascii', keys=[...]) ASCII table string
d.write(format='md', keys=[...]) Markdown table string
d.write(format='latex', keys=[...]) LaTeX table string
d.write(..., ctree=['key']) Categorized output

Project structure

dlist/
├── src/dlist/
│   ├── __init__.py      # Package entry point
│   ├── dlist.py         # Dlist class (core operations)
│   ├── helpers.py       # flattenDict, structDict, merge_dicts, getDitem
│   ├── formatters/      # ASCII, Markdown, LaTeX, Excel, JSON output
│   └── parsers/         # JSON, Excel input, pivot joins
├── tests/
│   ├── test_core.py         # core operations + list fields
│   ├── test_formatters.py   # ASCII, Markdown, LaTeX, Excel
│   └── test_excel_parser.py # Excel parser
├── docs/
│   ├── formatters.md    # formatter reference
│   └── parsers.md       # parser reference (Excel details)
├── example_input.xlsx   # sample Excel file for parser demo
└── pyproject.toml

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydlist-2.0.0.tar.gz (53.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydlist-2.0.0-py3-none-any.whl (45.7 kB view details)

Uploaded Python 3

File details

Details for the file pydlist-2.0.0.tar.gz.

File metadata

  • Download URL: pydlist-2.0.0.tar.gz
  • Upload date:
  • Size: 53.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Pop!_OS","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pydlist-2.0.0.tar.gz
Algorithm Hash digest
SHA256 b0e7e5b4ac14e4b84e694acefc947941f7ff73511ba82a8d0857982857ea6321
MD5 fe80f9310dae5ce64fc49be2d14fb9bf
BLAKE2b-256 1e115fb8cf245bcac8458bbfc8e126cbd4bff6af2b8d0879e5f760d93aa7432a

See more details on using hashes here.

File details

Details for the file pydlist-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: pydlist-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 45.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Pop!_OS","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pydlist-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4dc03975859b9832c2608329ed335d270116c2c779cba9224ee0179685ccba9d
MD5 ed2df24890b89dccf311ed1c70ec1f23
BLAKE2b-256 0a7bca2d1df80b7ff617af06be26d4ffdb85b664f9e44224ec5035d01568353f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page