Python 3 implementation of Flatdata
Project description
flatdata-py
Python 3 implementation of flatdata.
Running the tests
python3 -m pytest
Basic usage
Once you have created a flatdata schema file, you can generate a Python module to read your existing flatdata archive:
flatdata-generator --gen py --schema locations.flatdata --output-file locations.py
Performance tips
flatdata-py supports two data access patterns with very different performance characteristics on large archives.
Iterating over a vector yields one Python object per element. Each field access unpacks bits from the underlying memory-mapped data. This is fine for accessing individual elements or small ranges, but has significant per-element overhead for bulk operations:
count = sum(1 for x in archive.links if x.speed_limit > 100)
For bulk operations, use the vectorized access methods that read fields directly into NumPy arrays:
# single column access, returns a pandas DataFrame
df = archive.links.speed_limit
count = len(df[df['speed_limit'] > 100])
# full NumPy structured array with all fields
arr = archive.links.to_numpy()
count = int(np.sum(arr['speed_limit'] > 100))
# slices work too
arr = archive.links[1000:2000].to_numpy()
df = archive.links[::10].to_data_frame()
- Use
vector.field_name(column access) when you only need one or a few fields. - Use
vector.to_numpy()orvector.to_data_frame()when you need all fields at once. - Use
vector[i].fieldfor random access to individual elements. - The underlying data is memory-mapped; the OS pages it from disk on demand. Vectorized results are materialized as NumPy arrays in RAM.
Using the inspector
flatdata-py comes with a handy tool called the flatdata-inspector to inspect the contents of an archive:
- from the
flatdata-pysource directory:
./inspector.py
# or
python3 -m flatdata.lib.inspector
- if you want to install
flatdata-py:
pip3 install flatdata-py[inspector] # the inspector feature requires IPython
flatdata-inspector -p /path/to/my/flatdata.archive
Using the writer
flatdata-writer is an addition to flatdata-py that can create flatdata archives from a flatdata schema, with the following limitations:
-
does not allow adding additional sub-archives to an existing archive
-
supports only bulk-writing (no streaming)
-
not optimized for performance
-
from the
flatdata-pysource directory
./writer.py --schema archive.flatdata --output-dir testdir --json-file data.json --resource-name resourcename
#or
python3 -m flatdata.lib.writer --schema archive.flatdata --output-dir testdir --json-file data.json --resource-name resourcename
Note that the flatdata-writer CLI tool can only write one resource at a time. For archives that have multiple non-optional
resources, the tool has to be executed separately for each resource. Only after all resources have been written can the archive be opened.
- if you want to install flatdata-py:
pip3 install flatdata-py[writer]
flatdata-writer --schema archive.flatdata --output-dir testdir --json-file data.json --resource-name resourcename
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flatdata_py-0.4.11.tar.gz.
File metadata
- Download URL: flatdata_py-0.4.11.tar.gz
- Upload date:
- Size: 16.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
847ee3ef9084cb993437b1295760a4fbe8c59f80c5f342888a45226571b79cab
|
|
| MD5 |
5c91ed79b4fadbf933cd532392427059
|
|
| BLAKE2b-256 |
28a9c0e14088bc6ec99e18b00e891e319ce679f00a086f8a18d6630fd9c8332b
|
File details
Details for the file flatdata_py-0.4.11-py2.py3-none-any.whl.
File metadata
- Download URL: flatdata_py-0.4.11-py2.py3-none-any.whl
- Upload date:
- Size: 31.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d2586c53008a423c42c80365a7656e787a5bc937d7536dbfc91f2f7f89e0dcc
|
|
| MD5 |
19fd1f2f9d39d75ab9f370d1b6ec9b92
|
|
| BLAKE2b-256 |
e407f0fed0914fb52977a7bcbb587f6705ab0d0882db46ddef902eb008762e51
|