Skip to main content

An itertools-inspired toolkit for cached iterator and data-structure processing

Project description

itertoolkit

Functions creating iterators and cached data pipelines for efficient looping.

itertoolkit is an itertools-inspired wrapper focused on practical data processing. It keeps the lazy, composable style of iterator algebra, then adds cache-aware helpers so repeated list and data-structure transformations run faster.

The goal is simple:

  • Keep memory usage low with lazy iterators.
  • Speed up repeated workloads with caching.
  • Make iterator pipelines readable and reusable.

Installation

pip install itertoolkit

Quick Start

from itertools import count, islice

# Install name: itertoolkit
# Current import path in this repo remains bm_preprocessing
from bm_preprocessing import IR, DM

# Example: base itertools stream
stream = (x * x for x in count(1))
print(list(islice(stream, 5)))  # [1, 4, 9, 16, 25]

# Example: cached computation workflow (concept)
# result = itertoolkit.cached_map(expensive_fn, dataset, cache_key="v1")

Why It Is Faster

itertoolkit performance comes from combining:

  • Lazy iteration, so intermediate materialization is avoided.
  • Cache-first wrappers, so repeated transformations are reused.
  • Composable pipelines, so complex loops stay compact and optimized.

In repeated analytics or feature-building jobs, the first pass computes and stores results, and later passes can fetch from cache instead of recomputing every step.

Core Iterator Families

General iterators

Iterator concept Input Output shape Typical use
Running reduction iterable, func incremental totals rolling stats
Batching iterable, n tuples of size n chunk processing
Chaining multiple iterables one continuous stream merging sources
Selection data + selectors filtered stream mask-based filtering
Windowing iterable adjacent pairs/windows transition analysis
Truncation predicate/slice bounded output safe handling of infinite streams

Combinatoric iterators

Iterator concept Output
Cartesian products all pairings across inputs
Permutations order-sensitive tuples
Combinations order-insensitive unique tuples
Combinations with replacement tuples allowing repeated values

Pipeline Pattern

Use this pattern when processing large lists, tables, graphs, or text records:

  1. Start from one or more iterables.
  2. Chain filtering, mapping, grouping, and batching.
  3. Add cache boundaries around expensive stages.
  4. Materialize only where needed (list, tuple, DataFrame, model input).
from itertools import chain

sources = [[1, 2, 3], [4, 5], [6]]
pipeline = (x * 10 for x in chain.from_iterable(sources) if x % 2 == 0)
print(list(pipeline))  # [20, 40, 60]

Caching Strategy

Recommended caching behavior for data-heavy workloads:

  • Key by transformation signature and input fingerprint.
  • Keep deterministic steps cacheable.
  • Invalidate cache on function/version changes.
  • Persist long-running results between sessions.

This makes repeated preprocessing and feature extraction significantly cheaper.

Compatibility Note

Package distribution name is itertoolkit.

Current code in this repository still exposes the import path bm_preprocessing for compatibility with existing users. If needed, a follow-up release can add a top-level itertoolkit import alias as well.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itertoolkit-1.5.0.tar.gz (53.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

itertoolkit-1.5.0-py3-none-any.whl (66.8 kB view details)

Uploaded Python 3

File details

Details for the file itertoolkit-1.5.0.tar.gz.

File metadata

  • Download URL: itertoolkit-1.5.0.tar.gz
  • Upload date:
  • Size: 53.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for itertoolkit-1.5.0.tar.gz
Algorithm Hash digest
SHA256 3b4615fb9d5e5d3bce65d609986c29d43bbb5ec39d8272d7a676739e80e48194
MD5 792b8ad67a2ae9808151cab52322dbf8
BLAKE2b-256 8e71f78d036b00f0b19661abb7e1128856d796db5950d3e9cb92e77932dcb047

See more details on using hashes here.

File details

Details for the file itertoolkit-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: itertoolkit-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 66.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for itertoolkit-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 88f1f8a28918f19b8eb37069a22c353c42201b3b0f39e818fdc347fb78686191
MD5 02766f96aaee23babf22f1e1cca2d9f8
BLAKE2b-256 b1029bb137ee6df953b9c5baf79cf622258c0f2451c93851940e647a90ce48a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page