Skip to main content

High-performance sequence packing library for LLM training with Rust core and Python bindings

Project description

SeqPacker

High-performance sequence packing for LLM training, written in Rust with Python bindings.

CI Crates.io PyPI License: MIT

Documentation  |  Rust API  |  Benchmarks  |  Contributing


Training LLMs on variable-length sequences? Naive padding wastes 20-40% of GPU compute. SeqPacker packs sequences into fixed-size bins, achieving 95-99% utilization with 11 bin-packing algorithms — from O(n) streaming to near-optimal offline.

  • 11 algorithms — NF, FF, BF, WF, FFD, BFD, FFS, MFFD, OBFD, OBFDP, HK
  • Streaming API — bounded-space packing with incremental output
  • PyTorch integration — GPU-ready tensors out of the box
  • NumPy zero-copy — pass arrays directly, no conversion overhead
  • Cross-platform — Linux, macOS, Windows; Python 3.9-3.13

Installation

# Python (pip)
pip install seqpacker

# Python (uv)
uv add seqpacker

# Rust
cargo add seqpacker

Quick Start

Python

from seqpacker import pack_sequences

lengths = [1000, 800, 600, 500, 400, 300, 200, 100]
result = pack_sequences(lengths, capacity=1024)

print(result.bins)        # [[0], [1, 7], [2, 4], [3, 5, 6]]
print(result.efficiency)  # 0.952...

Rust

use seqpacker::{Packer, PackStrategy};

let packer = Packer::new(1024)
    .with_strategy(PackStrategy::OptimizedBestFitDecreasing);

let result = packer.pack_lengths(&[1000, 800, 600, 500, 400, 300, 200, 100]).unwrap();
println!("Efficiency: {:.2}%", result.metrics.efficiency * 100.0);

Algorithms

11 bin-packing algorithms from O(n) online to optimal offline:

Algorithm Short Time Approx. Ratio Best For
NextFit nf O(n) 2.0 Memory-constrained streaming
FirstFit ff O(n log B) 1.7 Online baseline
BestFit bf O(n log B) 1.7 Tighter online packing
WorstFit wf O(n log B) 2.0 Even distribution
FirstFitDecreasing ffd O(n log n) 1.22 Good offline default
BestFitDecreasing bfd O(n log n) 1.22 Tighter offline packing
FirstFitShuffle ffs O(n log n) ~1.3 Training randomness
ModifiedFFD mffd O(n log n) 1.18 Mixed-size distributions
OptimizedBFD obfd O(n log n) 1.22 Default (recommended)
ParallelOBFD obfdp O(n log n) 1.22 Large datasets (multi-threaded)
Harmonic-K hk O(n) ~1.69 Bounded-space online
from seqpacker import Packer

# Use any algorithm by short name (default: obfd)
packer = Packer(capacity=2048, strategy="obfd")
result = packer.pack([500, 600, 400, 1000])

# List all available strategies
print(Packer.strategies())

Usage Modes

Batch Packing

Pack all sequences at once. Best for offline dataset preprocessing. All 11 algorithms available.

from seqpacker import Packer

packer = Packer(capacity=2048, strategy="obfd")
result = packer.pack(sequence_lengths)

for pack in result.packs:
    print(pack.sequence_ids, pack.lengths, pack.used)

print(f"Efficiency: {result.efficiency:.2%}")
print(f"Packs: {result.num_bins}")

Streaming

Feed sequences one at a time. Completed packs are emitted incrementally. Only bounded-space algorithms supported: NextFit (nf) and Harmonic-K (hk).

from seqpacker import StreamPacker

sp = StreamPacker(capacity=2048, strategy="nf")

for length in dataset_lengths:
    for pack in sp.add(length):
        process(pack)  # completed packs emitted as they fill

for pack in sp.finish():
    process(pack)      # flush remaining

Buffer + Batch

Accumulate sequences into a buffer and pack periodically. Requires no special library support -- just call pack() on each buffer. All algorithms available.

from seqpacker import Packer

packer = Packer(capacity=2048, strategy="obfd")
buffer = []

for sample in dataset_stream:
    buffer.append(len(sample["input_ids"]))
    if len(buffer) >= 10_000:
        result = packer.pack(buffer)
        for pack in result.packs:
            yield pack
        buffer.clear()

if buffer:
    result = packer.pack(buffer)
    for pack in result.packs:
        yield pack

PyTorch Integration

seqpacker.torch_utils provides helpers for converting pack results into GPU-ready tensors. Torch is not a dependency -- import only when you need it.

from seqpacker.torch_utils import packed_collate_fn
from torch.utils.data import DataLoader

collate = packed_collate_fn(capacity=2048, strategy="obfd")
loader = DataLoader(dataset, collate_fn=collate, batch_size=256)

for batch in loader:
    outputs = model(
        input_ids=batch.input_ids,
        position_ids=batch.position_ids,
        labels=batch.labels,
    )

Or convert a PackResult directly:

from seqpacker import pack_sequences
from seqpacker.torch_utils import pack_result_to_tensors

result = pack_sequences(lengths, capacity=2048)
batch = pack_result_to_tensors(result=result, token_ids=token_ids)
# batch.input_ids, batch.cu_seqlens, batch.position_ids, batch.labels, batch.attention_mask

NumPy Support

Both list and NumPy array inputs are supported with zero-copy for NumPy:

import numpy as np
from seqpacker import Packer

packer = Packer(capacity=2048)
lengths = np.array([500, 600, 400, 1000], dtype=np.int64)
result = packer.pack(lengths)

# Flat NumPy output for maximum performance
items_flat, bin_offsets = packer.pack_flat(lengths)
bins = np.split(items_flat, bin_offsets)

Performance

SeqPacker achieves equal packing efficiency to competitors while being significantly faster:

Comparison Speedup Efficiency
vs LightBinPack (C++) ~1.2-1.5x faster Equal (98.76%)
vs greedy_ffd (Python) ~400x faster Equal
vs binpacking (Python) ~1,700x faster Equal
vs prtpy (Python) ~1,900x faster Equal

Benchmarked on 10,000 sequences across real-world datasets (Alpaca, UltraChat, C4). See the interactive benchmark dashboard for detailed results.

Contributing

See CONTRIBUTING.md for setup instructions and development workflow.

make install       # Install dependencies
make build-dev     # Build the Rust extension
make test          # Run all tests (400 Rust + 249 Python)
make help          # See all commands

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

seqpacker-0.1.0-cp314-cp314t-win_arm64.whl (233.7 kB view details)

Uploaded CPython 3.14tWindows ARM64

seqpacker-0.1.0-cp314-cp314t-win_amd64.whl (245.3 kB view details)

Uploaded CPython 3.14tWindows x86-64

seqpacker-0.1.0-cp314-cp314t-musllinux_1_2_x86_64.whl (561.8 kB view details)

Uploaded CPython 3.14tmusllinux: musl 1.2+ x86-64

seqpacker-0.1.0-cp314-cp314t-musllinux_1_2_aarch64.whl (508.3 kB view details)

Uploaded CPython 3.14tmusllinux: musl 1.2+ ARM64

seqpacker-0.1.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (349.0 kB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.17+ x86-64

seqpacker-0.1.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (330.6 kB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.17+ ARM64

seqpacker-0.1.0-cp314-cp314t-macosx_11_0_arm64.whl (300.9 kB view details)

Uploaded CPython 3.14tmacOS 11.0+ ARM64

seqpacker-0.1.0-cp314-cp314t-macosx_10_12_x86_64.whl (323.1 kB view details)

Uploaded CPython 3.14tmacOS 10.12+ x86-64

seqpacker-0.1.0-cp39-abi3-win_arm64.whl (236.3 kB view details)

Uploaded CPython 3.9+Windows ARM64

seqpacker-0.1.0-cp39-abi3-win_amd64.whl (246.9 kB view details)

Uploaded CPython 3.9+Windows x86-64

seqpacker-0.1.0-cp39-abi3-musllinux_1_2_x86_64.whl (564.7 kB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ x86-64

seqpacker-0.1.0-cp39-abi3-musllinux_1_2_aarch64.whl (512.1 kB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ ARM64

seqpacker-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (334.3 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

seqpacker-0.1.0-cp39-abi3-macosx_11_0_arm64.whl (305.6 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

seqpacker-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl (328.2 kB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

seqpacker-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (352.3 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file seqpacker-0.1.0-cp314-cp314t-win_arm64.whl.

File metadata

  • Download URL: seqpacker-0.1.0-cp314-cp314t-win_arm64.whl
  • Upload date:
  • Size: 233.7 kB
  • Tags: CPython 3.14t, Windows ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for seqpacker-0.1.0-cp314-cp314t-win_arm64.whl
Algorithm Hash digest
SHA256 d1f330d644348650fcb6a2c740b35bd6b9dfaa106623738adecab33d3daa0dd1
MD5 fe46963c15395eb98e24acb2d0551481
BLAKE2b-256 66e5e75d08fe38904858ddb97750bfbe668bf19206d57c48cf0cfe67b8ec46d0

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp314-cp314t-win_arm64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqpacker-0.1.0-cp314-cp314t-win_amd64.whl.

File metadata

  • Download URL: seqpacker-0.1.0-cp314-cp314t-win_amd64.whl
  • Upload date:
  • Size: 245.3 kB
  • Tags: CPython 3.14t, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for seqpacker-0.1.0-cp314-cp314t-win_amd64.whl
Algorithm Hash digest
SHA256 f6ad6835810233c0e72aa1d4f53139adaa21aa365e0b44912105d6f82cd5b8d4
MD5 350d872dd72a253158597e38e6b63c11
BLAKE2b-256 4746d77b6717497157393c9667ac5949a8b658446a4255afb695edc1aec4cefa

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp314-cp314t-win_amd64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqpacker-0.1.0-cp314-cp314t-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for seqpacker-0.1.0-cp314-cp314t-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 7c6c86be278fa5250df09322e99bc86b04813bbcc9e4931a2d7231cb79219fcc
MD5 e5d35b0a3ffc4647f0743d595e5b3efa
BLAKE2b-256 eee458baa0bb14e13fa50179f75952abd63a272bae14b4d007b2c54fdca5d2ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp314-cp314t-musllinux_1_2_x86_64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqpacker-0.1.0-cp314-cp314t-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for seqpacker-0.1.0-cp314-cp314t-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 1f834bc8614499f0e182f6645a0393985ab624b4181e42cc1ca2a9ae1b4e85ed
MD5 a65f577c57c7be52342e64cb900b9fd2
BLAKE2b-256 b0c2065948fb385829cbf9b8bd93db58cb2135eed587aa70e38605f7be2ff9bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp314-cp314t-musllinux_1_2_aarch64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqpacker-0.1.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for seqpacker-0.1.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2862d82c191650f3e8faab1540e0bd8a47a308a97d93001abb878f328c567da3
MD5 a910b1e9980c003c15678e776b527aa2
BLAKE2b-256 bfb13dcd312e87de73edc643007a242e7cca84748b4cd767af1d51c3703652f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqpacker-0.1.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for seqpacker-0.1.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 059ce5de5310dce2ad78a5d145a3f38162346a5556a928e8a3c5e410651644bb
MD5 0365b4f409215fe4582cb78091a4db19
BLAKE2b-256 f0d907d34a5879e37bb6c2901e7904d8cb068f8240c509ccddbf2b26332517b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqpacker-0.1.0-cp314-cp314t-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for seqpacker-0.1.0-cp314-cp314t-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cb8dcbe01691115d873aef98ccb3a8a9b29ae09e38f37da7b760c1c1df9e9fe8
MD5 9c7f0a3757c6275af74a07a1d7392454
BLAKE2b-256 ec49b60212610c62bb6f7425c7501f8f39f1c4b9b06c254ac5db79989ecfc14d

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp314-cp314t-macosx_11_0_arm64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqpacker-0.1.0-cp314-cp314t-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for seqpacker-0.1.0-cp314-cp314t-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0b9a195469799a89b4f4dcd6390af9ca61fe79277f8346bf702ad6531f1c22ed
MD5 cf9c78efbb78f841c9876bb270525446
BLAKE2b-256 9b09331edfaf8c727e0d09abb4e587720b1deac816b394ed6fb35a0f80e83827

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp314-cp314t-macosx_10_12_x86_64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqpacker-0.1.0-cp39-abi3-win_arm64.whl.

File metadata

  • Download URL: seqpacker-0.1.0-cp39-abi3-win_arm64.whl
  • Upload date:
  • Size: 236.3 kB
  • Tags: CPython 3.9+, Windows ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for seqpacker-0.1.0-cp39-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 38cf98bfec240de5ff4c86f85d81be9c5e4b3e74088ebbe1428c4461a174d9d1
MD5 c108fd27a99a409cd2dc2556542364a0
BLAKE2b-256 707745dd95c42bd4ff8c95ea061bba7712c6a2be42152ebfe1194afe829aabe6

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp39-abi3-win_arm64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqpacker-0.1.0-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: seqpacker-0.1.0-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 246.9 kB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for seqpacker-0.1.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 b19e53c92dc084d9ed432d712790987bcc9c60f5af2e7d232a1dd76bb611ef5c
MD5 3c4e729faea38283310fb58ce493a9f9
BLAKE2b-256 23ea0af7d4016bd6893db4e8d1a7960f4cdc67d2d3b6745eb60dc4d8aa2939ba

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp39-abi3-win_amd64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqpacker-0.1.0-cp39-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for seqpacker-0.1.0-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 430192e1a99c32e7f3794366a119d6825842b5bd53c069a91dc18af291be1aed
MD5 a43316133311fe60186400c403790f42
BLAKE2b-256 1e49595ceeefa7ac4b767eec26be569184226843b4003e3c2c92cf429508142e

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp39-abi3-musllinux_1_2_x86_64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqpacker-0.1.0-cp39-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for seqpacker-0.1.0-cp39-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 908977f1e9552bbc33cd848ea8a79c011cf4c37a5572dc8787a5df3f8bb64f26
MD5 54ab0b9528d49a3174e3d4421bf0452c
BLAKE2b-256 113ae03a70ad379ce277d3eac2baf8282df74316db844eb98e58476c62b7baa9

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp39-abi3-musllinux_1_2_aarch64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqpacker-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for seqpacker-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8e6e1981fa236b117c6dbd063d6e2da4e7772b1d5c72ff5b2c821899f4183e62
MD5 443bd160ecb55a2745c976ce5a881fba
BLAKE2b-256 8b01d1cb6b4008c2d41cfd05c0dd0f7e03c5e3f5924b02154bc7b2dea6411fc0

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqpacker-0.1.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for seqpacker-0.1.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c39669eeec46c14121a4d5ea66ec8cbd280884e31eef76fb39e506c4bb9b44ec
MD5 7cf7b2c59d1d4f0327c3c241332150f6
BLAKE2b-256 20e4545cf11c2398fe2a1c39558bb743cfc98a2cd9f420ac5b2182b264d68442

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqpacker-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for seqpacker-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 6efca9655e58eaeaf81120747a51ca9f027f7ca7b566ecd58f352942b0f84c83
MD5 d085432607de76b6ed969edae98c019d
BLAKE2b-256 629a283bede06f16b1c78c38c1ea49251baa44238149b2616c03a9dd758979be

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqpacker-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for seqpacker-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ada84fd801dddd9ab1085c508ef7940d77b3b281adda4ca17b990ed039255190
MD5 50f2eebad35b81262340969b5245934f
BLAKE2b-256 14b2328e50a078d95aca99e6b59de4ae100e7bee10885dca13c78a6d04ad86b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqpacker-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on AlphaKhaw/seqpacker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page