Efficient RaggedBuffer datatype that implements 3D arrays with variable-length 2nd dimension.
Project description
ENN Ragged Buffer
This Python package implements an efficient RaggedBuffer
datatype that is similar to
a 3D numpy array, but which allows for variable sequence length in the second
dimension. It was created primarily for use in enn-trainer
and currently only supports a small selection of the numpy array methods.
User Guide
Install the package with pip install ragged-buffer
.
The package currently supports three RaggedBuffer
variants, RaggedBufferF32
, RaggedBufferI64
, and RaggedBufferBool
.
Creating a RaggedBuffer
There are three ways to create a RaggedBuffer
:
RaggedBufferF32(features: int)
creates an emptyRaggedBuffer
with the specified number of features.RaggedBufferF32.from_flattened(flattened: np.ndarray, lenghts: np.ndarray)
creates aRaggedBuffer
from a flattened 2D numpy array and a 1D numpy array of lengths.RaggedBufferF32.from_array
creates aRaggedBuffer
(with equal sequence lenghts) from a 3D numpy array.
Creating an empty buffer and pushing each row:
import numpy as np
from ragged_buffer import RaggedBufferF32
# Create an empty RaggedBuffer with a feature size of 3
buffer = RaggedBufferF32(3)
# Push sequences with 3, 5, 0, and 1 elements
buffer.push(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.float32))
buffer.push(np.array([[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20, 21], [22, 23, 24]], dtype=np.float32))
buffer.push(np.array([], dtype=np.float32)) # Alternative: `buffer.push_empty()`
buffer.push(np.array([[25, 25, 27]], dtype=np.float32))
Creating a RaggedBuffer from a flat 2D numpy array which combines the first and second dimension, and an array of sequence lengths:
import numpy as np
from ragged_buffer import RaggedBufferF32
buffer = RaggedBufferF32.from_flattened(
np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20, 21], [22, 23, 24], [25, 25, 27]], dtype=np.float32),
np.array([3, 5, 0, 1], dtype=np.int64))
)
Creating a RaggedBuffer from a 3D numpy array (all sequences have the same length):
import numpy as np
from ragged_buffer import RaggedBufferF32
buffer = RaggedBufferF32.from_array(np.zeros((4, 5, 3), dtype=np.float32))
Get size
The size0
, size1
, and size2
methods return the number of sequences, the number of elements in a sequence, and the number of features respectively.
import numpy as np
from ragged_buffer import RaggedBufferF32
buffer = RaggedBufferF32.from_flattened(
np.zeros((9, 64), dtype=np.float32),
np.array([3, 5, 0, 1], dtype=np.int64))
)
# Get size of the first/batch dimension.
assert buffer.size0() == 10
# Get size of individual sequences.
assert buffer.size1(1) == 5
assert buffer.size1(2) == 0
# Get size of the last/feature dimension.
assert buffer.size2() == 64
Convert to numpy array
as_aray
converts a RaggedBuffer
to a flat 2D numpy array that combines the first and second dimension.
import numpy as np
from ragged_buffer import RaggedBufferI64
buffer = RaggedBufferI64(1)
buffer.push(np.array([[1], [1], [1]], dtype=np.int64))
buffer.push(np.array([[2], [2]], dtype=np.int64))
assert np.all(buffer.as_array(), np.array([[1], [1], [1], [2], [2]], dtype=np.int64))
Indexing
You can index a RaggedBuffer
with a single integer (returning a RaggedBuffer
with a single sequence), or with a numpy array of integers selecting/permuting multiple sequences.
import numpy as np
from ragged_buffer import RaggedBufferF32
# Create a new `RaggedBufferF32`
buffer = RaggedBufferF32.from_flattened(
np.arange(0, 40, dtype=np.float32).reshape(10, 4),
np.array([3, 5, 0, 1], dtype=np.int64)
)
# Retrieve the first sequence.
assert np.all(
buffer[0].as_array() ==
np.array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]], dtype=np.float32)
)
# Get a RaggedBatch with 2 randomly selected sequences.
buffer[np.random.permutation(4)[:2]]
Addition
You can add two RaggedBuffer
s with the +
operator if they have the same number of sequences, sequence lengths, and features. You can also add a RaggedBuffer
where all sequences have a length of 1 to a RaggedBuffer
with variable length sequences, broadcasting along each sequence.
import numpy as np
from ragged_buffer import RaggedBufferF32
# Create ragged buffer with dimensions (3, [1, 3, 2], 1)
rb3 = RaggedBufferI64(1)
rb3.push(np.array([[0]], dtype=np.int64))
rb3.push(np.array([[0], [1], [2]], dtype=np.int64))
rb3.push(np.array([[0], [5]], dtype=np.int64))
# Create ragged buffer with dimensions (3, [1, 1, 1], 1)
rb4 = RaggedBufferI64.from_array(np.array([0, 3, 10], dtype=np.int64).reshape(3, 1, 1))
# Add rb3 and rb4, broadcasting along the sequence dimension.
rb5 = rb3 + rb4
assert np.all(
rb5.as_array() == np.array([[0], [3], [4], [5], [10], [15]], dtype=np.int64)
)
Concatenation
The extend
method can be used to mutate a RaggedBuffer
by appending another RaggedBuffer
to it.
import numpy as np
from ragged_buffer import RaggedBufferF32
rb1 = RaggedBufferF32.from_array(np.zeros((4, 5, 3), dtype=np.float32))
rb2 = RaggedBufferF32.from_array(np.zeros((2, 5, 3), dtype=np.float32))
rb1.extend(r2)
assert rb1.size0() == 6
Clear
The clear
method removes all elements from a RaggedBuffer
without deallocating the underlying memory.
import numpy as np
from ragged_buffer import RaggedBufferF32
rb = RaggedBufferF32.from_array(np.zeros((4, 5, 3), dtype=np.float32))
rb.clear()
assert rb.size0() == 0
License
ENN Ragged Buffer dual-licensed under Apache-2.0 and MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for ragged_buffer-0.4.3-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9abd07ebf3f8a9e62d2db652a9862c98a09713e78ce687d557808d6314a9b42e |
|
MD5 | 3a68f7d607282b598c1ce06ed973b529 |
|
BLAKE2b-256 | 109776e06e87fa22be53569382a537565b38c0d82be0d5d7dd52f3008545fb6a |
Hashes for ragged_buffer-0.4.3-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae7ed5d1ced20bd51f1973009a699152efcb2a5736de5991a315159c26af2a84 |
|
MD5 | 9ebafa65f5d063d14943bd60b1e48d83 |
|
BLAKE2b-256 | d8b77d3b2320aee83ec24ff5698d86e6eb96f3ec4a28c32c322c1242ee2c1a76 |
Hashes for ragged_buffer-0.4.3-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07c5b920fb641c68c141d748e48653be1cbac7c0a44b6a5e097b73ca348966d0 |
|
MD5 | b83b5e62ab7a769562df1caaf25a25cb |
|
BLAKE2b-256 | 930f6130ffd97ee20253ea4b0b078d587102a502c515cbb650e3c818fff12829 |
Hashes for ragged_buffer-0.4.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd16d1e86f435f43ddbc0b5be0485d0d3c2afb82810f81227a8479e64422dd2b |
|
MD5 | 078146dad83601ca4594adc571343262 |
|
BLAKE2b-256 | d0bd44b7c866b78e73943266a3dea179aeae592deaf65292cc59489c85ff0a49 |
Hashes for ragged_buffer-0.4.3-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f62f6fd4487588a801f0aa169d4864a2f659a5f0fc83fb0de2f497645a8f8cc5 |
|
MD5 | 03cf9771be7cdf93acdf668aae4b6e2a |
|
BLAKE2b-256 | 2cf9971ac3c7a433d43eef8c4c051535aed63dac853bd4cbcc63ce1b1d1cfec3 |
Hashes for ragged_buffer-0.4.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7fb7a2df6111a779cff0ed6521a7c7e25b435d8b153f89eae726c6b3c823a078 |
|
MD5 | 4021be7d8ad8cc44f1b0c113f3ca637f |
|
BLAKE2b-256 | 72866576d32fa658dc35418df96856402a78d6f4d8bfe6d61736462f11ba6533 |
Hashes for ragged_buffer-0.4.3-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee8ceb5ee7dc51eda3fb0ff96ae370706517f59c2b8d38187707a3f3113c69b6 |
|
MD5 | a32e37f3198720faa02a814379dca7c1 |
|
BLAKE2b-256 | 6784d70fca2d22a12279e9d01a8b4c2e93fb902fde101456bba0284a2e408c07 |
Hashes for ragged_buffer-0.4.3-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5bcfb5e7e04b0ddd23a37651a90041140d3b010b9013ba2b9e9856eafd9b06cc |
|
MD5 | 371f577a88548c08ce05c9909b723d42 |
|
BLAKE2b-256 | 444aa2c5c0fc45a2e6c6791e5ffa963c05cf77b22526488e47580de93f2f1222 |
Hashes for ragged_buffer-0.4.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8cf7967a5911d07f5e203a0f3b0af0cb4b70c90b9e04af3b05129c61ad3de14 |
|
MD5 | 71a31c37df1589c97841260c848bd488 |
|
BLAKE2b-256 | c34c96dc27f3a9c7b3da4990881b51d61164ec22c0bad6372cae40671aee8882 |
Hashes for ragged_buffer-0.4.3-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ca9a6790616871c5c3e129d06ecee414dae3fdb44eeac0c1509bc3ba3ad76ab5 |
|
MD5 | e55436019ddfda959b60176b49f5e44f |
|
BLAKE2b-256 | fdb3f6ccd7dafb6cbeb054c8ca04954f7a68aefc49559dafcb4f4bd1331b486c |
Hashes for ragged_buffer-0.4.3-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 717b2adea8bf756483514b8e6ecad8820181e5b43cd829eea44d67dbf556d3c1 |
|
MD5 | 99a4973287ce396306479512f8aa8efd |
|
BLAKE2b-256 | 4e09141abaed1ef4daa0017c5648a138e2f8a6cdd55f36620d54bc9470375f92 |
Hashes for ragged_buffer-0.4.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db154d8d3989c55f75d3f8b9158b4c2de2331e297ded42c3bb9f234f31540787 |
|
MD5 | b9e1381ac8b7dd951800fab9ee1dc728 |
|
BLAKE2b-256 | 229d911fc2737d8b0068a5bcfb4c06ae91fcd7e079d173a7ecbe5b2708171946 |
Hashes for ragged_buffer-0.4.3-cp37-cp37m-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5372a7094df7289ccd7927c144137f561bb20c237bc7da493af92c2da426bf1b |
|
MD5 | 694ddc30ce44dda5c96655295e403c9a |
|
BLAKE2b-256 | 5168fbaec40304860592c39a7b8d1c489c1a7c7ebcc7b82ce607bdf32a5075fd |