An itertools-inspired toolkit for cached iterator and data-structure processing

Project description

itertoolkit

Functions creating iterators and cached data pipelines for efficient looping.

itertoolkit is an itertools-inspired wrapper focused on practical data processing. It keeps the lazy, composable style of iterator algebra, then adds cache-aware helpers so repeated list and data-structure transformations run faster.

The goal is simple:

Keep memory usage low with lazy iterators.
Speed up repeated workloads with caching.
Make iterator pipelines readable and reusable.

Installation

pip install itertoolkit

Quick Start

from itertools import count, islice

# Install name: itertoolkit
# Current import path in this repo remains bm_preprocessing
from bm_preprocessing import IR, DM

# Example: base itertools stream
stream = (x * x for x in count(1))
print(list(islice(stream, 5)))  # [1, 4, 9, 16, 25]

# Example: cached computation workflow (concept)
# result = itertoolkit.cached_map(expensive_fn, dataset, cache_key="v1")

Why It Is Faster

itertoolkit performance comes from combining:

Lazy iteration, so intermediate materialization is avoided.
Cache-first wrappers, so repeated transformations are reused.
Composable pipelines, so complex loops stay compact and optimized.

In repeated analytics or feature-building jobs, the first pass computes and stores results, and later passes can fetch from cache instead of recomputing every step.

Core Iterator Families

General iterators

Iterator concept	Input	Output shape	Typical use
Running reduction	iterable, func	incremental totals	rolling stats
Batching	iterable, n	tuples of size n	chunk processing
Chaining	multiple iterables	one continuous stream	merging sources
Selection	data + selectors	filtered stream	mask-based filtering
Windowing	iterable	adjacent pairs/windows	transition analysis
Truncation	predicate/slice	bounded output	safe handling of infinite streams

Combinatoric iterators

Iterator concept	Output
Cartesian products	all pairings across inputs
Permutations	order-sensitive tuples
Combinations	order-insensitive unique tuples
Combinations with replacement	tuples allowing repeated values

Pipeline Pattern

Use this pattern when processing large lists, tables, graphs, or text records:

Start from one or more iterables.
Chain filtering, mapping, grouping, and batching.
Add cache boundaries around expensive stages.
Materialize only where needed (list, tuple, DataFrame, model input).

from itertools import chain

sources = [[1, 2, 3], [4, 5], [6]]
pipeline = (x * 10 for x in chain.from_iterable(sources) if x % 2 == 0)
print(list(pipeline))  # [20, 40, 60]

Caching Strategy

Recommended caching behavior for data-heavy workloads:

Key by transformation signature and input fingerprint.
Keep deterministic steps cacheable.
Invalidate cache on function/version changes.
Persist long-running results between sessions.

This makes repeated preprocessing and feature extraction significantly cheaper.

Compatibility Note

Package distribution name is itertoolkit.

Current code in this repository still exposes the import path bm_preprocessing for compatibility with existing users. If needed, a follow-up release can add a top-level itertoolkit import alias as well.

License

MIT

Project details

Release history Release notifications | RSS feed

1.5.9

Apr 15, 2026

1.5.5

Apr 14, 2026

1.5.4

Apr 14, 2026

1.5.3

Apr 13, 2026

1.5.2

Apr 13, 2026

1.5.1

Apr 13, 2026

1.5.0

Apr 13, 2026

This version

1.4.9

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itertoolkit-1.4.9.tar.gz (53.8 kB view details)

Uploaded Apr 13, 2026 Source

File details

Details for the file itertoolkit-1.4.9.tar.gz.

File metadata

Download URL: itertoolkit-1.4.9.tar.gz
Upload date: Apr 13, 2026
Size: 53.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for itertoolkit-1.4.9.tar.gz
Algorithm	Hash digest
SHA256	`b2eefeda85185983799ff48cc6b33674a34c4e049e4db4694901d74ac01cd13c`
MD5	`7361e78a7a70d8779ac17c2513403511`
BLAKE2b-256	`9af0e046f6e1efe5e31258fb9115a4d3138ea2e41caedde6fa4db4da669f976c`

See more details on using hashes here.

itertoolkit 1.4.9

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

itertoolkit

Installation

Quick Start

Why It Is Faster

Core Iterator Families

General iterators

Combinatoric iterators

Pipeline Pattern

Caching Strategy

Compatibility Note

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes