PyTorch extension for handling deeply nested sequences of variable length

Project description

Tests

FoldedTensor: PyTorch extension for handling deeply nested sequences of variable length

foldedtensor is a PyTorch extension that provides efficient handling of tensors containing deeply nested sequences variable sizes. It enables the flattening/unflattening (or unfolding/folding) of data dimensions based on a inner structure of sequence lengths. This library is particularly useful when working with data that can be split in different ways and enables you to avoid choosing a fixed representation.

Installation

The library can be installed with pip:

pip install foldedtensor

Features

Support for arbitrary numbers of nested dimensions
No computational overhead when dealing with already padded tensors
Dynamic re-padding (or refolding) of data based on stored inner lengths
Automatic mask generation and updating whenever the tensor is refolded
C++ optimized code for fast data loading from Python lists and refolding
Flexibility in data representation, making it easy to switch between different layouts when needed

Examples

At its simplest, foldedtensor can be used to convert nested Python lists into a PyTorch tensor:

from foldedtensor import as_folded_tensor

ft = as_folded_tensor(
    [
        [0, 1, 2],
        [3],
    ],
)
# FoldedTensor([[0, 1, 2],
#               [3, 0, 0]])

You can also specify names and flattened/unflattened dimensions at the time of creation:

import torch
from foldedtensor import as_folded_tensor

# Creating a folded tensor from a nested list
# There are 2 samples, the first with 5 lines, the second with 1 line.
# Each line contain between 1 and 2 words.
ft = as_folded_tensor(
    [
        [[1], [], [], [], [2, 3]],
        [[4, 3]],
    ],
    data_dims=("samples", "words"),
    full_names=("samples", "lines", "words"),
    dtype=torch.long,
)
print(ft)
# FoldedTensor([[1, 2, 3],
#               [4, 3, 0]])

Once created, you can change the shape of the tensor by refolding it:

# Refold on the lines and words dims (flatten the samples dim)
print(ft.refold(("lines", "words")))
# FoldedTensor([[1, 0],
#               [0, 0],
#               [0, 0],
#               [0, 0],
#               [2, 3],
#               [4, 3]])

# Refold on the words dim only: flatten everything
print(ft.refold(("words",)))
# FoldedTensor([1, 2, 3, 4, 3])

The tensor can be further used with standard PyTorch operations:

# Working with PyTorch operations
embedder = torch.nn.Embedding(10, 16)
embedding = embedder(ft.refold(("words",)))
print(embedding.shape)
# torch.Size([5, 16]) # 5 words total, 16 dims

refolded_embedding = embedding.refold(("samples", "words"))
print(refolded_embedding.shape)
# torch.Size([2, 5, 16]) # 2 samples, 5 words max, 16 dims

Benchmarks

View the comparisons of foldedtensor against various alternatives here: docs/benchmarks.

Comparison with alternatives

Unlike other ragged or nested tensor implementations, a FoldedTensor does not enforce a specific structure on the nested data, and does not require padding all dimensions. This provides the user with greater flexibility when working with data that can be arranged in multiple ways depending on the data transformation. Moreover, the C++ optimization ensures high performance, making it ideal for handling deeply nested tensors efficiently.

Here is a comparison with other common implementations for handling nested sequences of variable length:

Feature	NestedTensor	MaskedTensor	FoldedTensor
Inner data structure	Flat	Padded	Arbitrary
Max nesting level	1	1	∞
From nested python lists	No	No	Yes
Layout conversion	To padded	No	Any
Reduction ops w/o padding	Yes	No	No

Project details

Release history Release notifications | RSS feed

0.3.5

Sep 16, 2024

0.3.4

May 12, 2024

This version

0.3.3

Feb 14, 2024

0.3.2

Oct 12, 2023

0.3.1

Aug 30, 2023

0.3.0

Jul 7, 2023

0.2.2

Jun 5, 2023

0.2.1.post0

May 23, 2023

0.2.1

May 23, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

foldedtensor-0.3.3.tar.gz (17.7 kB view hashes)

Uploaded Feb 14, 2024 Source

Built Distributions

foldedtensor-0.3.3-cp312-cp312-win_amd64.whl (79.2 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.12 Windows x86-64

foldedtensor-0.3.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (117.3 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.12 manylinux: glibc 2.17+ x86-64

foldedtensor-0.3.3-cp312-cp312-macosx_11_0_arm64.whl (78.4 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.12 macOS 11.0+ ARM64

foldedtensor-0.3.3-cp312-cp312-macosx_10_9_x86_64.whl (82.2 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.12 macOS 10.9+ x86-64

foldedtensor-0.3.3-cp311-cp311-win_amd64.whl (79.7 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.11 Windows x86-64

foldedtensor-0.3.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (117.1 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.11 manylinux: glibc 2.17+ x86-64

foldedtensor-0.3.3-cp311-cp311-macosx_11_0_arm64.whl (81.2 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.11 macOS 11.0+ ARM64

foldedtensor-0.3.3-cp311-cp311-macosx_10_9_x86_64.whl (84.7 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.11 macOS 10.9+ x86-64

foldedtensor-0.3.3-cp310-cp310-win_amd64.whl (78.8 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.10 Windows x86-64

foldedtensor-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (115.4 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.10 manylinux: glibc 2.17+ x86-64

foldedtensor-0.3.3-cp310-cp310-macosx_11_0_arm64.whl (80.0 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.10 macOS 11.0+ ARM64

foldedtensor-0.3.3-cp310-cp310-macosx_10_9_x86_64.whl (83.3 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.10 macOS 10.9+ x86-64

foldedtensor-0.3.3-cp39-cp39-win_amd64.whl (78.5 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.9 Windows x86-64

foldedtensor-0.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (115.5 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.9 manylinux: glibc 2.17+ x86-64

foldedtensor-0.3.3-cp39-cp39-macosx_11_0_arm64.whl (80.1 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.9 macOS 11.0+ ARM64

foldedtensor-0.3.3-cp39-cp39-macosx_10_9_x86_64.whl (83.4 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.9 macOS 10.9+ x86-64

foldedtensor-0.3.3-cp38-cp38-win_amd64.whl (78.7 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.8 Windows x86-64

foldedtensor-0.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (115.3 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.8 manylinux: glibc 2.17+ x86-64

foldedtensor-0.3.3-cp38-cp38-macosx_11_0_arm64.whl (79.9 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.8 macOS 11.0+ ARM64

foldedtensor-0.3.3-cp38-cp38-macosx_10_9_x86_64.whl (83.2 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.8 macOS 10.9+ x86-64

foldedtensor-0.3.3-cp37-cp37m-win_amd64.whl (79.5 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.7m Windows x86-64

foldedtensor-0.3.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (116.8 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.7m manylinux: glibc 2.17+ x86-64

foldedtensor-0.3.3-cp37-cp37m-macosx_10_9_x86_64.whl (83.2 kB view hashes)

Uploaded Feb 14, 2024 CPython 3.7m macOS 10.9+ x86-64

Hashes for foldedtensor-0.3.3.tar.gz

Hashes for foldedtensor-0.3.3.tar.gz
Algorithm	Hash digest
SHA256	`f7d292474e38eec492d1fdbd126f82580460aeed05a6c463a984e36dfe476c23`
MD5	`eaa4e3a57a735e8dd12915f46e0fd747`
BLAKE2b-256	`d6f864f6e6d1eced35f9cd910004ff76fd868fa7cc1ccb219f92f4182292cb62`

Hashes for foldedtensor-0.3.3-cp312-cp312-win_amd64.whl

Hashes for foldedtensor-0.3.3-cp312-cp312-win_amd64.whl
Algorithm	Hash digest
SHA256	`b3d63c2c55131c3978026155823d4cf064942e2ee75878af8f1f2891caf99989`
MD5	`2c2acd7596402cda807eb7d28be88f29`
BLAKE2b-256	`8ed5e73201400285bf7ab0b99b86769b6a4f376b430915e05852982808f6731f`

Hashes for foldedtensor-0.3.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for foldedtensor-0.3.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`7ad6dc893eaa92e2e3e116b482382a692d1107897a8d886363ddfb6c9cc7751e`
MD5	`c1c127de0dfe567598e552c4c71d1817`
BLAKE2b-256	`cf9e1c23a65d34eacbd81d337b14102e2621d77699c42a3f354af9a893f8ffc6`

Hashes for foldedtensor-0.3.3-cp312-cp312-macosx_11_0_arm64.whl

Hashes for foldedtensor-0.3.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`09de92ae344927520e562d1743648130612c010b85001e93ac097e8b46c65796`
MD5	`eec5f61dfe21f32f6dda896be405b858`
BLAKE2b-256	`6c8f2a016efe59b5586303af170a98184b67860a8fd3221eb52c1996ac7085b3`

Hashes for foldedtensor-0.3.3-cp312-cp312-macosx_10_9_x86_64.whl

Hashes for foldedtensor-0.3.3-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`1ddd35b0dc27be5014f22abf1631d01093f1594df0c0e19a16d398e7f06ffe91`
MD5	`56ed40657eed6c84b11fc0fd42005bd1`
BLAKE2b-256	`2891fce53bbd297880fa42c620b4dd30e8286bd0a2784b279a0ded49bf232c49`

Hashes for foldedtensor-0.3.3-cp311-cp311-win_amd64.whl

Hashes for foldedtensor-0.3.3-cp311-cp311-win_amd64.whl
Algorithm	Hash digest
SHA256	`dc953bc57014738557c309f0fc9839abff3e50094d468895be346102afce12d6`
MD5	`29f69b33da3a3eb18d55e3032428bfdb`
BLAKE2b-256	`cab3c958d3f015168eec4e0acb71da5755d979e4016b9c74a8531a70262c4119`

Hashes for foldedtensor-0.3.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for foldedtensor-0.3.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`4632fb081c4a99feecfe3a440b81aa97fad622c2153445e7b5b0e0cb7dbba20a`
MD5	`365bcac65af6ae5042fcc9cec3e44255`
BLAKE2b-256	`5668275c3527a765d15391620d4cee9e164903f30e117d5c33a71cf8dfb2784b`

Hashes for foldedtensor-0.3.3-cp311-cp311-macosx_11_0_arm64.whl

Hashes for foldedtensor-0.3.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`78cb3b29fd581ec49be38a97804f18c326c754ebef4e17307e09fcffba87ef5e`
MD5	`7c76ae26f91d0efb8efc8989347b0402`
BLAKE2b-256	`279a2dface2c152bd330c2a33414900bb438c3181ec0cca0e9cbbcd615c09cf6`

Hashes for foldedtensor-0.3.3-cp311-cp311-macosx_10_9_x86_64.whl

Hashes for foldedtensor-0.3.3-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`7d02dbc1c8b53ab5c9b5be17332929aa49a44081f9861fea72c2d22e2ddcf33f`
MD5	`c78685fbd2a970fe1bc0d2ed499710d1`
BLAKE2b-256	`ba1c30e6d98f7773ce4bd625ac1c03263c83b77023d1cdf6a68f85f44a2a9169`

Hashes for foldedtensor-0.3.3-cp310-cp310-win_amd64.whl

Hashes for foldedtensor-0.3.3-cp310-cp310-win_amd64.whl
Algorithm	Hash digest
SHA256	`9482ffbd28aa900cf76fbc10e733c3639f13b8e452280e044fcf20597979014d`
MD5	`153ddf0c806697e08ef3d0917eab1972`
BLAKE2b-256	`ab2a728cb7f6265f0513815f74a32f4c00b7630094dd7a6f5b1310257b0514d8`

Hashes for foldedtensor-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for foldedtensor-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`dfeeeca81113b97bdd7d3e39626952d64931342a85d53e209221f56ba3d45ebd`
MD5	`052dfa240760bbe9811289ec5517e175`
BLAKE2b-256	`40af4c077035da23c48bc9a4f84fe1af66f9126959899e81da92cffc70ebafca`

Hashes for foldedtensor-0.3.3-cp310-cp310-macosx_11_0_arm64.whl

Hashes for foldedtensor-0.3.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`68308274ec6f8a89c94a01de3ffbde23ebbefd4261dda629100b4c59ff8d9f28`
MD5	`b7b57149bb4175c64e4a67651ead1c1e`
BLAKE2b-256	`6fcc1fe0d51bc8d18a0f0050738f3601c81533554cca5a0c1df8db6740e886a8`

Hashes for foldedtensor-0.3.3-cp310-cp310-macosx_10_9_x86_64.whl

Hashes for foldedtensor-0.3.3-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`88d4e21394427c606da98b5a83063efb11b5b472b50a280d21547775dfa6531b`
MD5	`372e359feda50ca32d80f1ad211d8db9`
BLAKE2b-256	`22123947b5f1a423ac0cd31eaaff91a738eb7896b3ca794c598f82a63381f760`

Hashes for foldedtensor-0.3.3-cp39-cp39-win_amd64.whl

Hashes for foldedtensor-0.3.3-cp39-cp39-win_amd64.whl
Algorithm	Hash digest
SHA256	`453d6d4d35a69495bd67d9127bf5eca3729e461b2e890ffc7d96513dcc1a8037`
MD5	`b4d812451c4951fd7277f6e002924995`
BLAKE2b-256	`ab63c5f943c563b682c3fb89c0b9ae4faaec802d47b18f16e3a739062a631171`

Hashes for foldedtensor-0.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for foldedtensor-0.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`7a57e2c6f48f69884122f34d2c54b9e10d761c285f0b758728da944d280d71ff`
MD5	`55e992e3f8df4e76110e16b7e2bf850a`
BLAKE2b-256	`2fe20576b71acead64ee94d8dca7a1cfb8a8432e930c71ac3c4639f3f46ff5f8`

Hashes for foldedtensor-0.3.3-cp39-cp39-macosx_11_0_arm64.whl

Hashes for foldedtensor-0.3.3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`5e6505f832ca083c76a289ac6c77a81d1aa04127f88632411ad5f8b453c258cc`
MD5	`152f98a17ac6f6ade7d462c8b89f7c52`
BLAKE2b-256	`a9b7fad762b1c404bb870ea8ac07972d83ad076060dc75a0627663e12801a619`

Hashes for foldedtensor-0.3.3-cp39-cp39-macosx_10_9_x86_64.whl

Hashes for foldedtensor-0.3.3-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`88b6c21e8655f3931b30fddceff46b4b43685d71337612b6d7c95bc271552b2f`
MD5	`67bbb747c504445015ef8d90745b6e39`
BLAKE2b-256	`7a033c33c043fb618b490693ea902f26aa2638ae244a3dea06b688818a7cf528`

Hashes for foldedtensor-0.3.3-cp38-cp38-win_amd64.whl

Hashes for foldedtensor-0.3.3-cp38-cp38-win_amd64.whl
Algorithm	Hash digest
SHA256	`81bd2c356e75ab10f98fb9da60121563fac2821902f142cd4ec790e3a8c700f1`
MD5	`466263d6d848090b4e9d884ea06ef2c6`
BLAKE2b-256	`44c1767162f462288449914b252756ef570da1f974b980a13761b46b73da26e7`

Hashes for foldedtensor-0.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for foldedtensor-0.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`9cec382d66cd17a9db7716f4ba33fbcbd32564ed284e0deaf8b2633967629e8c`
MD5	`962ab62bd50974347a045c74ddd5d442`
BLAKE2b-256	`3e4670d71c401733f5a3dd3367e590db19a04df5c4b697945078d6aa26144d55`

Hashes for foldedtensor-0.3.3-cp38-cp38-macosx_11_0_arm64.whl

Hashes for foldedtensor-0.3.3-cp38-cp38-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`00bae12581709483fbbed49fe1635e5811778a872e83118174f2322bd71e9ac9`
MD5	`c530c26d43303cdeb5bd0ee39ccef80f`
BLAKE2b-256	`48fbd52c124e1e3a4dabfe4b5b0d3496f52446bbac50312d5d343e15b902c8af`

Hashes for foldedtensor-0.3.3-cp38-cp38-macosx_10_9_x86_64.whl

Hashes for foldedtensor-0.3.3-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`30d56facc82c8b232a16308cb2fdae9758a0c7fce48884e599bb8c0d440f75a5`
MD5	`5b047ec1cf93d93e7459f0272d4d42f1`
BLAKE2b-256	`af5dd5e0bf8839604fb7c7b67a39c880fa6fee4a018ed85331392054c9a47086`

Hashes for foldedtensor-0.3.3-cp37-cp37m-win_amd64.whl

Hashes for foldedtensor-0.3.3-cp37-cp37m-win_amd64.whl
Algorithm	Hash digest
SHA256	`f90ab2112975c81e60764ea97d5886346916ee0a514c5d7fb1125c1f9472469d`
MD5	`8bc5493a91a5996c89680a7dec914235`
BLAKE2b-256	`c2b8be79ff5c467760bd4a4df9f5a014d156788d542b7a57893edb0ce942221e`

Hashes for foldedtensor-0.3.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for foldedtensor-0.3.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`41c70cdcc89fdcb0bdc0dfb45dfa79ffc343af12aaf5d9057a0a8ec394ad3b08`
MD5	`6abe25588dc2fb85e7fbaf997eedc5d2`
BLAKE2b-256	`64586f440a00cdc4a6a26e2131bf8d26702fd21f69e63d485dd2483b67aecc7c`

Hashes for foldedtensor-0.3.3-cp37-cp37m-macosx_10_9_x86_64.whl

Hashes for foldedtensor-0.3.3-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`f1e9081f025abdeaec2810e2dd2d8a2e7a09130570f6f270f882fa2768c9d4b2`
MD5	`73a48d63c559b24cc130b94e618dc065`
BLAKE2b-256	`1285bd574646d1fd8c5e5f6a44ae75e9e3a5341f0d97b195c46cd1c9b2796117`