Skip to main content

PyArrow extension types for storing fixed-shape and ragged tensors

Project description

ndarrow

PyArrow extension types for efficiently storing tensors with variable or fixed shapes.

Overview

ndarrow provides two complementary PyArrow-native storage formats for tensor data:

  • TensorArray — every element has exactly the same shape; the entire batch is backed by a single contiguous buffer.
  • RaggedTensorArray — each element is a tensor whose leading dimension may vary, with a fixed inner_shape shared by all elements.

Both types store the element shape and NumPy dtype in the Arrow extension metadata, so they round-trip correctly through IPC and Parquet without any extra configuration.

Installation

pip install ndarrow

Usage

import numpy as np
import pyarrow as pa
import pyarrow.parquet as pq
from ndarrow import TensorArray, RaggedTensorArray

# Fixed-shape: one 64-dim embedding per sentence
sentence_embeddings = TensorArray.from_numpy(np.random.randn(3, 64))
# also accepts a list of arrays:
# sentence_embeddings = TensorArray.from_numpy([
#     np.random.randn(64)
#     for _ in range(3)
# ])

# Ragged: each sentence has a variable number of tokens, each with a 64-dim embedding
token_embeddings = RaggedTensorArray.from_numpy([
    np.random.randn(6, 64),
    np.random.randn(9, 64),
    np.random.randn(3, 64),
])

table = pa.table({"sentence_embeddings": sentence_embeddings, "token_embeddings": token_embeddings})
print(table.schema)
# sentence_embeddings: extension<ndarrow.tensor<TensorType>>
# token_embeddings:    extension<ndarrow.ragged_tensor<RaggedTensorType>>

# Round-trip through Parquet — type metadata is preserved
pq.write_table(table, "data.parquet")
table2 = pq.read_table("data.parquet")

embeddings_np = table2.column("sentence_embeddings").chunk(0).to_numpy()  # shape (3, 64)
tokens_list   = table2.column("token_embeddings").chunk(0).to_numpy()     # list of 3 arrays of shape (?, 64)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ndarrow-0.1.1.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ndarrow-0.1.1-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file ndarrow-0.1.1.tar.gz.

File metadata

  • Download URL: ndarrow-0.1.1.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ndarrow-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f5137067cfa76d8e266b63a683aa5cb75558432768c78749d514da4c38084618
MD5 9e9a9c3eacb51de1e2ebc980770e8cd4
BLAKE2b-256 5cf429b9998270151033a0a0df74201fda8375e7ec77f15cf64ae8f19e27a2d9

See more details on using hashes here.

File details

Details for the file ndarrow-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ndarrow-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ndarrow-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7741b8e41862e8dc87fe334fb809614dbab24e19c3f48bffbd22ea74aa7acbad
MD5 f6ca474390479ff9f3fef7d3b3829419
BLAKE2b-256 c41a1ad3562ebbb1866cfb79b3a36fdca2b43c19c887356cc442726c127b1561

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page