Skip to main content

Python 3 implementation of Flatdata

Project description

flatdata-py

Build Status

Python 3 implementation of flatdata.

Running the tests

python3 -m pytest

Basic usage

Once you have created a flatdata schema file, you can generate a Python module to read your existing flatdata archive:

flatdata-generator --gen py --schema locations.flatdata --output-file locations.py

Performance tips

flatdata-py supports two data access patterns with very different performance characteristics on large archives.

Iterating over a vector yields one Python object per element. Each field access unpacks bits from the underlying memory-mapped data. This is fine for accessing individual elements or small ranges, but has significant per-element overhead for bulk operations:

count = sum(1 for x in archive.links if x.speed_limit > 100)

For bulk operations, use the vectorized access methods that read fields directly into NumPy arrays:

# single column access, returns a pandas DataFrame
df = archive.links.speed_limit
count = len(df[df['speed_limit'] > 100])

# full NumPy structured array with all fields
arr = archive.links.to_numpy()
count = int(np.sum(arr['speed_limit'] > 100))

# slices work too
arr = archive.links[1000:2000].to_numpy()
df = archive.links[::10].to_data_frame()
  • Use vector.field_name (column access) when you only need one or a few fields.
  • Use vector.to_numpy() or vector.to_data_frame() when you need all fields at once.
  • Use vector[i].field for random access to individual elements.
  • The underlying data is memory-mapped; the OS pages it from disk on demand. Vectorized results are materialized as NumPy arrays in RAM.

Using the inspector

flatdata-py comes with a handy tool called the flatdata-inspector to inspect the contents of an archive:

  • from the flatdata-py source directory:
./inspector.py
# or
python3 -m flatdata.lib.inspector
  • if you want to install flatdata-py:
pip3 install flatdata-py[inspector]  # the inspector feature requires IPython
flatdata-inspector -p /path/to/my/flatdata.archive

Using the writer

flatdata-writer is an addition to flatdata-py that can create flatdata archives from a flatdata schema, with the following limitations:

  • does not allow adding additional sub-archives to an existing archive

  • supports only bulk-writing (no streaming)

  • not optimized for performance

  • from the flatdata-py source directory

./writer.py --schema archive.flatdata --output-dir testdir --json-file data.json --resource-name resourcename
#or
python3 -m flatdata.lib.writer --schema archive.flatdata --output-dir testdir --json-file data.json --resource-name resourcename

Note that the flatdata-writer CLI tool can only write one resource at a time. For archives that have multiple non-optional resources, the tool has to be executed separately for each resource. Only after all resources have been written can the archive be opened.

  • if you want to install flatdata-py:
pip3 install flatdata-py[writer]
flatdata-writer --schema archive.flatdata --output-dir testdir --json-file data.json --resource-name resourcename

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flatdata_py-0.4.11.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flatdata_py-0.4.11-py2.py3-none-any.whl (31.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file flatdata_py-0.4.11.tar.gz.

File metadata

  • Download URL: flatdata_py-0.4.11.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.8

File hashes

Hashes for flatdata_py-0.4.11.tar.gz
Algorithm Hash digest
SHA256 847ee3ef9084cb993437b1295760a4fbe8c59f80c5f342888a45226571b79cab
MD5 5c91ed79b4fadbf933cd532392427059
BLAKE2b-256 28a9c0e14088bc6ec99e18b00e891e319ce679f00a086f8a18d6630fd9c8332b

See more details on using hashes here.

File details

Details for the file flatdata_py-0.4.11-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for flatdata_py-0.4.11-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0d2586c53008a423c42c80365a7656e787a5bc937d7536dbfc91f2f7f89e0dcc
MD5 19fd1f2f9d39d75ab9f370d1b6ec9b92
BLAKE2b-256 e407f0fed0914fb52977a7bcbb587f6705ab0d0882db46ddef902eb008762e51

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page