Skip to main content

NumPy structured array utilities — joining, flattening, field views, enum mapping, position arrays, and a dynamic (std::vector-like) array

Project description

NumPy Utils

NumPy structured array utilities — dtype construction, field views, joining, enum mapping, position arrays, byte/string conversion for C++ interop, and growable (std::vector-like) arrays.

Overview

vcti-nputils collects the low-level NumPy helpers shared across the vcti stack. Most of it is stateless functions over structured arrays and dtypes — building and reshaping dtypes, taking zero-copy field views, joining arrays, mapping enum values, and converting byte fields for pybind11/C++ interop. Alongside those it provides two small stateful containers for building a numpy array of unknown final size: GrowableArray (append-only — the common case) and DynamicArray (the same growth model plus soft deletion). Both grow on demand and hand off a plain contiguous array via to_numpy() for the read-heavy phase.

Which to use: append-only? GrowableArray. Need to remove elements mid-build? DynamicArray.

Installation

pip install vcti-nputils>=1.6.0

In pyproject.toml dependencies

dependencies = [
    "vcti-nputils>=1.6.0",
]

Quick Start

import numpy as np
from vcti.nputils import (
    as_ndarray,
    check_overflow,
    decode_field,
    drop_fields,
    DynamicArray,
    encode_field,
    fields_view,
    flatten_dtype,
    GrowableArray,
    join_struct_arrays,
    merge_adjacent_fields,
    name_array,
    position_array,
    rename_fields,
    structured_dtype,
    with_encoding,
)

# Join structured arrays horizontally
dt1 = np.dtype([('id', 'i4'), ('value', 'f8')])
dt2 = np.dtype([('name', 'U10')])
arr1 = np.array([(1, 1.5), (2, 2.5)], dtype=dt1)
arr2 = np.array([('Alice',), ('Bob',)], dtype=dt2)
joined = join_struct_arrays([arr1, arr2])
# dtype: [('id', 'i4'), ('value', 'f8'), ('name', 'U10')]

# Create a zero-copy view with selected fields
view = fields_view(joined, ['id', 'name'])

# Drop fields from a structured array (zero-copy)
clean = drop_fields(joined, ['value'])

# Build a structured dtype from a scalar dtype + names
coord_dt = structured_dtype('f8', ['x', 'y', 'z'])
# dtype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')])

# Rename fields in a dtype
new_dt = rename_fields(dt1, {'id': 'node_id', 'value': 'temperature'})

# Flatten array fields into individual columns (default naming)
dt = np.dtype([('id', 'i4'), ('coords', 'f8', (3,))])
_, cols = flatten_dtype(dt)
# cols: ['id', 'coord_0', 'coord_1', 'coord_2']

# Flatten with explicit per-field names
_, cols = flatten_dtype(dt, field_names={'coords': ['x', 'y', 'z']})
# cols: ['id', 'x', 'y', 'z']

# Flatten with a custom format string
_, cols = flatten_dtype(dt, fmt="{name}[{dim}]")
# cols: ['id', 'coord[0]', 'coord[1]', 'coord[2]']

# Merge adjacent 'S' fields into one (pure dtype view). Multiple merges
# can be specified at once; same-field overlap and name collisions are
# validated before anything is returned.
dt = np.dtype([
    ('first', 'S4'), ('last', 'S6'),
    ('city', 'S8'), ('state', 'S2'),
    ('age', 'i4'),
])
merged = merge_adjacent_fields(dt, {
    'name':    ['first', 'last'],
    'address': ['city', 'state'],
})
# dtype([('name', 'S10'), ('address', 'S10'), ('age', '<i4')])

# Map numeric enum values to names
enum_dict = {1: 'ACTIVE', 2: 'INACTIVE', 3: 'PENDING'}
names = name_array(np.array([1, 2, 1, 3]), enum_dict)

# Convert counts to cumulative offsets
offsets = position_array(np.array([3, 2, 4, 1]))
# array([0, 3, 5, 9, 10])

# Safely coerce inputs to ndarray
arr = as_ndarray([1, 2, 3], dtype=np.float64)
empty = as_ndarray(None)  # array([], dtype=float64)

# Byte <-> string conversion for C++/pybind11 interop
dt = np.dtype([('name', 'S10'), ('name_length', 'i4')])
sa = np.zeros(2, dtype=dt)
encode_field(sa, 'name', ['Alice', 'Bob'], length_field='name_length')
decoded = decode_field(sa, 'name')
overflow = check_overflow(sa, 'name', 'name_length')

# Attach encoding to a dtype so decode_field/encode_field use it automatically
name_dt = with_encoding(np.dtype('S32'), 'latin-1')

# Build an array incrementally without knowing the final size (append-only)
ga = GrowableArray(np.dtype([('id', 'i4'), ('value', 'f8')]))
ga.append((1, 1.5))
ga.extend([(2, 2.5), (3, 3.5)])
result = ga.to_numpy()  # independent, exact-size array you own

# Need to remove elements mid-build? DynamicArray adds soft deletion
da = DynamicArray(np.dtype('i8'))
da.extend([10, 20, 30, 40])
da.delete(1)            # elements shift: da[1] is now 30
clean = da.to_numpy()   # array([10, 30, 40])

These are typed, numpy-backed accumulators, not faster lists. For a method-by-method comparison, measured trade-offs, and guidance on when to use them versus a Python list, see docs/design/comparison.md; for usage recipes, docs/patterns.md. Reproduce the numbers with python benchmarks/benchmark_growable_array.py.


Module layout

Each category lives in its own module. All public functions are re-exported from vcti.nputils.

Module Functions
dtype_utils structured_dtype, flatten_dtype (+ flatten_record_dtype alias), merge_adjacent_fields, rename_fields
view_utils fields_view, drop_fields
join_utils join_struct_arrays
mapping_utils name_array
offset_utils position_array
coerce_utils as_ndarray
growable_array GrowableArray
dynamic_array DynamicArray
byte_utils string_from_bytes, bytes_from_string, decode_column, encode_column, decode_field, encode_field, check_overflow, get_encoding, with_encoding, ZERO_CHAR

Functions

Dtype construction & transformation

Function Purpose
structured_dtype(dtype, names) Build a structured dtype from a scalar or subdtype plus field names
flatten_dtype(dt, *, field_names, fmt, strip_plural) Expand array fields into scalars with flexible naming
flatten_record_dtype(dt, ...) Legacy alias for flatten_dtype
merge_adjacent_fields(dt, merges) Merge one or more groups of adjacent 'S' fields into a single field each (pure dtype view)
rename_fields(dt, mapping) Return a new dtype with fields renamed

Zero-copy views

Function Purpose
fields_view(sa, fields) View containing only the selected fields
drop_fields(sa, exclude) View containing all fields except those excluded

Joining

Function Purpose
join_struct_arrays(arrays) Join structured arrays horizontally by combining fields

Mapping, offsets, coercion

Function Purpose
name_array(nparray, enum_dict, default) Map numeric values to string names
position_array(counts, dtype) Convert count array to cumulative offset array
as_ndarray(value, dtype) Coerce None, list, or ndarray to ndarray

Containers

Type Purpose
GrowableArray(dtype, initial_capacity, growth_factor) Append-only growable numpy array: amortised O(1) append/extend (append_get_index returns the index), reserve/shrink_to_fit/clear, full/resize for sized fills, zero-copy as_array() view, and to_numpy() for an independent copy
DynamicArray(dtype, initial_capacity, growth_factor) Same growth model plus soft deletion: lazy delete (shifting semantics), compact, active_indices

Byte / string conversion (pybind11 interop)

Function Purpose
string_from_bytes(value, encoding) Decode a single bytes value, stripping null padding
bytes_from_string(value, length, encoding) Encode to fixed-length bytes (pad or truncate)
decode_column(byte_array, encoding) Vectorized decode of a byte column to strings
encode_column(strings, length, encoding) Vectorized encode to (bytes, lengths)
decode_field(sa, field_name, *, encoding) Decode a byte field in a structured array
encode_field(sa, field_name, strings, *, length_field, encoding) Encode strings into a byte field, optionally populating a paired length field
check_overflow(sa, field_name, length_field) Detect rows where the original encoded byte length exceeded the field
get_encoding(dtype, default) Read encoding from dtype.metadata['encoding']
with_encoding(dtype, encoding) Attach encoding to a scalar dtype via metadata
ZERO_CHAR The null character ("\x00") used to strip/pad byte fields

Dependencies

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcti_nputils-1.6.0.tar.gz (37.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vcti_nputils-1.6.0-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file vcti_nputils-1.6.0.tar.gz.

File metadata

  • Download URL: vcti_nputils-1.6.0.tar.gz
  • Upload date:
  • Size: 37.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vcti_nputils-1.6.0.tar.gz
Algorithm Hash digest
SHA256 6108dbe417537b5c9a1bf89fcf7fac3f6d774ba4444d363c5c0e8ff69f346219
MD5 9a72da36da7af52b2bfb9db042384a6b
BLAKE2b-256 2de5ca9c9f2d320b501ac26045223b59cbb9e534a94ea68ec54d7a2370a9c63b

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_nputils-1.6.0.tar.gz:

Publisher: publish.yml on vcollab/vcti-python-nputils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vcti_nputils-1.6.0-py3-none-any.whl.

File metadata

  • Download URL: vcti_nputils-1.6.0-py3-none-any.whl
  • Upload date:
  • Size: 30.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vcti_nputils-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 87c2c6f9e7daa5f8be42ceb70851350880b19e38a777e5205ad3d21822e48174
MD5 b7cd55ea7c029ebce3d415e2cb7ff9d4
BLAKE2b-256 77633d6eabc20c6ac66eddf57b196b9794e99e3cb7c70a740077c03723bf9650

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_nputils-1.6.0-py3-none-any.whl:

Publisher: publish.yml on vcollab/vcti-python-nputils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page