Skip to main content

VTK zstandard compression library.

Project description

pypi ci mit

Seamlessly compress VTK datasets using Zstandard.

Read in VTK datasets 37x faster, write 14x faster, all while using 28% less space versus VTK’s modern XML format.

Read/Write Speedup and Compression Ratios

Read/Write Speedup and Compression Ratios

File Type / Method

Write Speed

Compression Ratio

Notes

Legacy VTK (.vtk)

465 MB/s

0.88

Significant overhead

VTK XML, none

256 MB/s

0.70

Significant overhead

VTK XML, zlib

105 MB/s

2.52

VTK Default

VTK XML, lz4

401 MB/s

1.47

VTK XML, lzma

9.93 MB/s

3.10

VTK HDF (.vtkhdf), lvl0

1733 MB/s

0.93

No compression

VTK HDF (.vtkhdf), lvl4

137 MB/s

2.37

Default compression

pyvista-zstd (.pv), lvl3

711 MB/s

3.02

Threads = 0

pyvista-zstd (.pv), lvl3

1845 MB/s

3.02

Threads = 4

pyvista-zstd (.pv), lvl22

15.8 MB/s

3.79

All threads (-1)

Usage

Install with:

pip install pyvista-zstd

Compatible with all VTK dataset types. Uses PyVista under the hood.

import pyvista_zstd

# create and write out
ds = pv.Sphere()
pyvista_zstd.write(ds, "dataset.pv")

# read in and show these are identical
ds_in = pyvista_zstd.read("dataset.pv")
assert ds == ds_in

Alternative VTK example

import vtk
import pyvista_zstd

# create dataset using VTK source
sphere_source = vtk.vtkSphereSource()
sphere_source.SetRadius(1.0)
sphere_source.SetThetaResolution(32)
sphere_source.SetPhiResolution(32)
sphere_source.Update()

vtk_ds = sphere_source.GetOutput()

# read back
pyvista_zstd.write(vtk_ds, "sphere.pv")
ds_in = pyvista_zstd.read("sphere.pv")

PyVista Integration

When pyvista-zstd is installed, it automatically registers with PyVista’s reader registry. This means pv.read() handles .pv files directly:

import pyvista as pv

mesh = pv.read("dataset.pv")

No additional imports needed. This works via PyVista’s pyvista.readers entry point group, so the registration happens at install time.

Rational

VTK’s XML writer is flexible and supports most datasets, but its compression is limited to a single thread, has only a subset of compression algorithms, and the XML format adds significant overhead.

To demonstrate this, the following example writes out a single file without compression. This example requires pyvista>=0.47.0 for the compression parameter.

>>> import numpy as np
>>> import pyvista as pv
>>> ugrid = pv.ImageData(dimensions=(200, 200, 200)).to_tetrahedra()
>>> ugrid["pdata"] = np.random.random(ugrid.n_points)
>>> ugrid["cdata"] = np.random.random(ugrid.n_cells)
>>> nbytes = (
...     ugrid.points.nbytes
...     + ugrid.cell_connectivity.nbytes
...     + ugrid.offset.nbytes
...     + ugrid.celltypes.nbytes
...     + ugrid["pdata"].nbytes
...     + ugrid["cdata"].nbytes
... )
>>> print(f"Size in memory: {nbytes / 1024**2:.2f} MB")

Size in memory: 1993.89 MB
Save using VTK XML format

>>> from pathlib import Path
>>> import time
>>> tmp_path = Path("/tmp/ds.vtu")
>>> tstart = time.time()
>>> ugrid.save(tmp_path, compression=None)
>>> print(f"Written without compression in {time.time() - tstart:.2f} seconds")
>>> nbytes_disk = tmp_path.stat().st_size
>>> print(f"  File size:            {nbytes_disk / 1024**2:.2f} MB")
>>> print(f"  Compression Ratio:    {nbytes / nbytes_disk}")
>>> print()

Written without compression in 7.93 seconds
File size:            2858.94 MB
Compression Ratio:    0.6974239255525742

This amounts to around a 43% overhead using VTK’s XML writer. Using the default compression we can get the file size down to 791 MB, but it takes 19 seconds to compress.

>>> tstart = time.time()
>>> ugrid.save(tmp_path, compression='zlib')  # default
>>> print(f"Compressed in {time.time() - tstart:.2f} seconds")
>>> nbytes_disk = tmp_path.stat().st_size
>>> print(f"  File size:            {nbytes_disk / 1024**2:.2f} MB")
>>> print(f"  Compression Ratio:    {nbytes / nbytes_disk}")
>>> print()

Compressed in 18.83 seconds
File size:            791.05 MB
Compression Ratio:    2.5205590295735663

Clearly there’s room for improvement here as this amounts to a compression rate of 105.89 MB/s.

VTK Compression with Zstandard: pyvista-zstd

This library, pyvista-zstd, writes out VTK datasets with minimal overhead and uses Zstandard for compression. Moreover, it’s been implemented with multi-threading support for both read and write operations.

Let’s compress that file again but this time using pyvista-zstd:

>>> import pyvista_zstd
>>> tmp_path = Path("/tmp/ds.pv")
>>> tstart = time.time()
>>> pyvista_zstd.write(ugrid, tmp_path)
>>> print(f"Compressed pyvista_zstd in {time.time() - tstart:.2f} seconds")
>>> nbytes_disk = tmp_path.stat().st_size
>>> print(f"  File size:            {nbytes_disk / 1024**2:.2f} MB")
>>> print(f"  Compression Ratio:    {nbytes / nbytes_disk}")

Compressed pyvista_zstd in 0.92 seconds
Threads:              -1
File size:            660.41 MB
Compression Ratio:    3.019175309922273

This gives us a write performance of 2167 MB/s using the default number of threads and compression level, resulting in a 20x speedup in write performance versus VTK’s XML writer. This speedup is most noticeable for larger files:

Speedup versus VTK’s XML

Speedup versus VTK’s XML

Even when disabling multi-threading we can still achieve excellent performance:

>>> tstart = time.time()
>>> pyvista_zstd.write(ugrid, tmp_path, n_threads=0)
>>> print(f"Compressed pyvista_zstd in {time.time() - tstart:.2f} seconds")
>>> nbytes_disk = tmp_path.stat().st_size
>>> print(f"  File size:            {nbytes_disk / 1024**2:.2f} MB")
>>> print(f"  Compression Ratio:    {nbytes / nbytes_disk}")

Compressed pyvista_zstd in 2.91 seconds
Threads:              0
File size:            660.47 MB
Compression Ratio:    3.0188911592355683

This amounts to a single-core compression rate of 685.18 MB/s, which is in agreement with Zstandard’s benchmarks.

Note that the benefit of threading drops off rapidly past 8 threads, though part of this is due to the performance versus efficiency cores of the CPU used for benchmarking (see below).

Read/Write Speed versus Number of Threads

Read/Write Speed versus Number of Threads


Reading in the dataset is also fast. Comparing with VTK’s XML reader using defaults:

Read VTK XML

>>> print(f"Read VTK XML:")
>>> timeit pv.read("/tmp/ds.vtu")
6.22 s ± 9.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Read zstd

>>> print(f"Read zstd:")
>>> timeit pyvista_zstd.read("/tmp/ds.pv")
563 ms ± 7.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This is an 11x speedup for this dataset versus VTK’s XML, and it’s still fast even with multi-threading disabled:

>>> timeit pyvista_zstd.read("/tmp/ds.pv", n_threads=0)
1.11 s ± 4.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This amounts to 1796 MB/s for a single core, which is also in agreement with Zstandard’s benchmarks.

Additionally, you can control Zstandard’s compression level by setting level=. A quick benchmark for this dataset indicates the defaults give a reasonable performance versus size tradeoff:

Read/Write Speed versus Compression Level

Read/Write Speed versus Compression Level

Note that both pyvista-zstd and VTK’s XML default compression give relatively constant compression ratios for this dataset across varying file sizes:

Compression Ratio versus VTK’s XML

Compression Ratio versus VTK’s XML

These benchmarks were performed on an i9-14900KF running the Linux kernel 6.12.41 using zstandard==0.24.0 from PyPI. Storage was a 2TB Samsung 990 Pro without LUKS mounted at /tmp.

Additional Information

The benchmarks/ directory contains additional benchmarks using many datasets, including all applicable datasets in pyvista.examples (see PyVista Dataset Gallery).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyvista_zstd-0.2.2.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyvista_zstd-0.2.2-py2.py3-none-any.whl (20.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pyvista_zstd-0.2.2.tar.gz.

File metadata

  • Download URL: pyvista_zstd-0.2.2.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyvista_zstd-0.2.2.tar.gz
Algorithm Hash digest
SHA256 568ad11dca11278b188b50d3f7503b6df058f4631b2a9d7e137dbd4aa1f04257
MD5 926468a0f8e9e976f2d49f42c5b246ba
BLAKE2b-256 57feeb881cfd369cba488184e171dcbf618b09823fd5e73391f7899ce7e98192

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyvista_zstd-0.2.2.tar.gz:

Publisher: ci_cd.yml on pyvista/pyvista-zstd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyvista_zstd-0.2.2-py2.py3-none-any.whl.

File metadata

  • Download URL: pyvista_zstd-0.2.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 20.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyvista_zstd-0.2.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e6f846a6c70db7554e320d034bc3d2b4f8a8750f26544fdf219c59855208e2de
MD5 4d9a9e158a4d3af0f1666af6a5f67455
BLAKE2b-256 71c59ff7836cdeeea4a6ff1f4fce93d089be54212bc81e527f7108bead11fe50

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyvista_zstd-0.2.2-py2.py3-none-any.whl:

Publisher: ci_cd.yml on pyvista/pyvista-zstd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page