PyArrow extension types for storing fixed-shape and ragged tensors
Project description
ndarrow
PyArrow extension types for efficiently storing tensors with variable or fixed shapes.
Overview
ndarrow provides two complementary PyArrow-native storage formats for tensor data:
TensorArray— every element has exactly the sameshape; the entire batch is backed by a single contiguous buffer.RaggedTensorArray— each element is a tensor whose leading dimension may vary, with a fixedinner_shapeshared by all elements.
Both types store the element shape and NumPy dtype in the Arrow extension metadata, so they round-trip correctly through IPC and Parquet without any extra configuration.
Installation
pip install ndarrow
Usage
import numpy as np
import pyarrow as pa
import pyarrow.parquet as pq
from ndarrow import TensorArray, RaggedTensorArray
# Fixed-shape: one 64-dim embedding per sentence
sentence_embeddings = TensorArray.from_numpy(np.random.randn(3, 64))
# also accepts a list of arrays:
# sentence_embeddings = TensorArray.from_numpy([
# np.random.randn(64)
# for _ in range(3)
# ])
# Ragged: each sentence has a variable number of tokens, each with a 64-dim embedding
token_embeddings = RaggedTensorArray.from_numpy([
np.random.randn(6, 64),
np.random.randn(9, 64),
np.random.randn(3, 64),
])
table = pa.table({"sentence_embeddings": sentence_embeddings, "token_embeddings": token_embeddings})
print(table.schema)
# sentence_embeddings: extension<ndarrow.tensor<TensorType>>
# token_embeddings: extension<ndarrow.ragged_tensor<RaggedTensorType>>
# Round-trip through Parquet — type metadata is preserved
pq.write_table(table, "data.parquet")
table2 = pq.read_table("data.parquet")
embeddings_np = table2.column("sentence_embeddings").chunk(0).to_numpy() # shape (3, 64)
tokens_list = table2.column("token_embeddings").chunk(0).to_numpy() # list of 3 arrays of shape (?, 64)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ndarrow-0.1.1.tar.gz.
File metadata
- Download URL: ndarrow-0.1.1.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5137067cfa76d8e266b63a683aa5cb75558432768c78749d514da4c38084618
|
|
| MD5 |
9e9a9c3eacb51de1e2ebc980770e8cd4
|
|
| BLAKE2b-256 |
5cf429b9998270151033a0a0df74201fda8375e7ec77f15cf64ae8f19e27a2d9
|
File details
Details for the file ndarrow-0.1.1-py3-none-any.whl.
File metadata
- Download URL: ndarrow-0.1.1-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7741b8e41862e8dc87fe334fb809614dbab24e19c3f48bffbd22ea74aa7acbad
|
|
| MD5 |
f6ca474390479ff9f3fef7d3b3829419
|
|
| BLAKE2b-256 |
c41a1ad3562ebbb1866cfb79b3a36fdca2b43c19c887356cc442726c127b1561
|