Skip to main content

zCurve maps multidimensional data to one dimension while preserving locality of the data points.

Project description

zCurve

DOI

zCurve is a Python module with methods to efficiently map multidimensional data to a single dimension while preserving locality of the data points.

This mapping is commonly known as Z-order, Lebesgue curve, Morton space filling curve, Morton order or Morton code.

Image by David Eppstein, 2008

The Morton code of a multi-dimensional data point is calculated by bitwise interlacing the binary representations of its coordinate values.

zCurve provides two functions for handling the encoding and decoding of data points with arbitrary dimensionality and arbitrary coordinate size:

interlace(*data_point: int, dims: int = None, bits_per_dim: int = None) -> int
deinterlace(code_point: int, dims: int = 3) -> List[int]

When handling large multi-dimensional dataset (n > 10.000), zCurve offers some simple but convenient means of parallelizing the Morton encoding and decoding:

par_interlace(data_points: List[List[int]], dims: int = None, bits_per_dim: int = None) -> List[int]
par_deinterlace(code_points: List[int], dims: int = 3) -> List[List[int]]

Given the Morton codes of a multi-dimensional dataset, we can perform multi-dimensional range search using only a one-dimensional data structure. For range searching, zCurve offers two functions for calculating the necesaary LITMAX and BIGMIN values:

prev_morton(code_point: int, rmin_code: int, rmax_code: int, dims: int = 3) -> int
next_morton(code_point: int, rmin_code: int, rmax_code: int, dims: int = 3) -> int

This implementation is based on the following paper

Tropf, Herbert, and Helmut Herzog. "Multidimensional Range Search in Dynamically Balanced Trees." ANGEWANDTE INFO. 2 (1981): 71-77.

and it makes heavy use of the excellent gmpy2 module.

Installation

pip install zCurve

Usage

Basics

import zCurve as z 

imports the module.

code = z.interlace(2,16,8)

interlaces the 3D data point (2,16,8) into Morton code point 10248.

When explicitly specify dimensionality and bits per dimension of your data point

code = z.interlace(2,16,8, dims=3, bits_per_dim=5)

performance will benefit substantially.

z.deinterlace(4711)

deinterlaces the Morton code point 4711 into the 3D data point (29,1,3).

Parallel interlacing/deinterlacing

Given a potentially large list of n-dimensional data_points

from random import randrange

bit_size = 16
max_val = 2**bit_size - 1
no_samples = 10**6
data_points = [(randrange(0, max_val), randrange(0, max_val), randrange(0, max_val)) for i in range(no_samples)]

we can speed up things by using par_interlace and par_deinterlace

morton_codes = z.par_interlace(data_points, dims=3, bits_per_dim=16)
data_points == z.par_deinterlaces(morton_codes, dims=3)

Range searching

Image by Tropf and Herzog, 1981

When range searching, we can prune the search space by calculating BIGMIN (aka "GetNextZ-address") and LITMAX (aka "GetPrevZ-address") values.

point = z.interlace(6, 3, dims=2)  # => 30
rmin = z.interlace(5, 3, dims=2)   # => 27
rmax = z.interlace(10, 5, dims=2)  # => 102

BIGMIN = z.next_morton(point, rmin, rmax, dims=2) # => 31
LITMAX = z.prev_morton(point, rmin, rmax, dims=2) # => 27

In addition, we can easily check if a given Morton code point is within a specified range

z.in_range(58,27,102, dims=2) # => False
z.in_range(49,27,102, dims=2) # => True

Citation

@misc{rmrschub_2021_zCurve,
    author       = {René Schubotz},
    title        = {{zCurve: Multi-dimensional indexing using Morton space filling curves.}},
    month        = may,
    year         = 2021,
    doi          = {10.5281/zenodo.4777584},
    version      = {0.0.4},
    publisher    = {Zenodo},
    url          = {https://github.com/rmrschub/zCurve}
    }

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zCurve-0.0.4.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

zCurve-0.0.4-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file zCurve-0.0.4.tar.gz.

File metadata

  • Download URL: zCurve-0.0.4.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for zCurve-0.0.4.tar.gz
Algorithm Hash digest
SHA256 ac88333acab2befbdbfdb38f56b3eda829fbd32f2b2e1942f759393145f37bd4
MD5 ec1a0b70b43a79136bb200e3232c20fb
BLAKE2b-256 496e1f8d5c5e0cc4ccd9c95e1f9301fcc4ebd98fac81d04727600a9176f69abf

See more details on using hashes here.

File details

Details for the file zCurve-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: zCurve-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for zCurve-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e628e2c3b16cead495a676092347ef99333faafc9013b1a1e87e8a0f110b25b9
MD5 b28fa34376bd3dbf7fee01befb3b4e2e
BLAKE2b-256 1f5bbc934181fb5da9f55b79a04ae643245387eb42659b28ee5be1866d67c041

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page