High-performance parallel iterators for Python 3.14+ free-threaded mode
Project description
FastIter
Parallel iterators for Python 3.14+, built on the free-threaded mode (no GIL).
from fastiter import par_range
result = par_range(0, 3_000_000).map(lambda x: x * x).sum()
Features
- Parallel processing for CPU-bound work on large datasets
- Familiar iterator API (
map,filter,reduce,sum, etc.) - Requires Python 3.14 free-threaded build (
python3.14t) - 40 tests
Installation
pip install fastiter
uv add fastiter
Requirements: Python 3.14+ free-threaded build required (python3.14t)
⚠️ With GIL enabled, FastIter will be slower than sequential code for CPU-bound work - threads contend on the GIL and add overhead with no benefit. Free-threaded mode is not optional for real speedups.
Quick Start
from fastiter import par_range, into_par_iter, set_num_threads
# Configure threads (optional, auto-detects CPU count)
set_num_threads(4)
# Process ranges in parallel
total = par_range(0, 1_000_000).sum()
# Chain operations
evens = (
par_range(0, 10_000)
.filter(lambda x: x % 2 == 0)
.map(lambda x: x ** 2)
.collect()
)
# Work with lists
data = list(range(100_000))
result = into_par_iter(data).map(lambda x: x * 2).sum()
Performance
Measured on 10-core system with Python 3.14t (GIL disabled):
| Threads | Simple Sum (3M items) | CPU-Intensive Work |
|---|---|---|
| 2 | 1.9x | 1.9x |
| 4 | 3.7x | 2.3x |
| 8 | 4.2x | 3.9x |
| 10 | 5.6x | 3.7x |
Sweet spot: 4 threads for balanced performance
When to use FastIter
✅ Works great:
- Large datasets (500k+ items)
- CPU-bound computations
- Simple numeric operations
- Pure functions without shared state
❌ Not recommended:
- Small datasets (<100k items) - overhead dominates
- I/O-bound operations - use
asyncioinstead - Heavy lambda usage - Python function call overhead
API
Create parallel iterators
par_range(start, stop, step=1) # Parallel range
into_par_iter(iterable) # Convert any iterable
Operations
.map(func) # Transform each element
.filter(predicate) # Keep matching elements
.sum() # Sum all elements
.count() # Count elements
.min() / .max() # Find min/max
.any() / .all() # Test predicates
.reduce(id, op) # Custom reduction
.collect() # Gather to list
.for_each(func) # Execute function on each
Configuration
from fastiter import set_num_threads
set_num_threads(4) # Set thread count
# Or: export FASTITER_NUM_THREADS=4
Examples
CPU-intensive work:
def expensive_computation(x):
result = x
for _ in range(20):
result = (result * 1.1 + 1) % 1000000
return int(result)
# 3.9x faster with 8 threads
result = par_range(0, 200_000).map(expensive_computation).sum()
Simple aggregations:
# Sum of squares
total = par_range(0, 1_000_000).map(lambda x: x * x).sum()
# Count evens
count = par_range(0, 1_000_000).filter(lambda x: x % 2 == 0).count()
# Find maximum
maximum = into_par_iter([1, 5, 3, 9, 2]).max()
Complex pipelines:
result = (
par_range(0, 1_000_000)
.filter(lambda x: x % 2 == 0)
.map(lambda x: x * x)
.filter(lambda x: x > 1000)
.sum()
)
How It Works
FastIter uses a divide-and-conquer approach:
- Split: Data is recursively divided into chunks
- Distribute: Chunks are processed across threads
- Reduce: Results are combined back together
Adaptive depth limiting is used to prevent thread pool exhaustion.
Benchmarks
Run your own benchmarks (must use free-threaded build):
# Full benchmark suite
uv run --python 3.14t python benchmarks/benchmark.py
# Quick demo
uv run --python 3.14t python main.py
# Run tests
uv run --python 3.14t pytest tests/ -v
Running benchmarks with
python3.14(GIL enabled) will show worse-than-sequential numbers - that's expected and correct behavior, not a bug.
Architecture
fastiter/
├── protocols.py # Producer/Consumer abstractions
├── core.py # ParallelIterator with 15+ operations
├── bridge.py # Work distribution (adaptive depth limiting)
├── consumers.py # Map, Filter, Reduce, etc.
└── producers.py # Range, List, Tuple data sources
See GUIDE.md for implementation details.
Requirements
Python 3.14+ free-threaded build (python3.14t) is required for speedups.
FastIter uses ThreadPoolExecutor under the hood. With the GIL enabled, Python threads cannot run CPU-bound bytecode simultaneously - they serialize and add overhead, making parallel execution slower than sequential for the workloads FastIter targets. Free-threading is not a recommendation; it is what makes the library work.
# Install the free-threaded build
uv python install 3.14t # or pyenv install 3.14t
# Verify GIL is disabled
python3.14t -c "import sys; print('GIL disabled:', not sys._is_gil_enabled())"
# Should print: GIL disabled: True
# If you import FastIter with the GIL enabled, you will see a RuntimeWarning
What happens without free-threading?
| Mode | Result |
|---|---|
python3.14t (GIL off) |
✅ 2–5.6x speedup |
python3.14 (GIL on) |
❌ Slower than sequential (thread overhead + GIL contention) |
| Python < 3.14 | ❌ Not supported |
FAQ
Why threads instead of multiprocessing.Pool?
Processes require pickling every argument and result across a process boundary (Python docs: Exchanging objects between processes). For fine-grained numeric operations on large datasets, that serialisation cost dominates — you spend more time copying data than computing it. Threads share memory directly, so the only overhead is task submission and result collection. With the GIL gone, threads get true parallel CPU execution with none of the process spawn (~50–100ms per worker) or pickle cost.
multiprocessing.Pool |
FastIter (threads) | |
|---|---|---|
| Spawn cost | ~50–100ms per worker | ~0.1ms per thread |
| Data serialisation | Pickle on every call (docs) | None (shared memory) |
| Memory per worker | Full process copy | Shared |
| Sweet spot | Few coarse, long-running tasks | Many fine-grained ops on large datasets |
You can measure this directly with the included benchmark:
uv run --python 3.14t python benchmarks/benchmark_vs_multiprocessing.py
If you have a handful of coarse, long-running tasks (seconds each, not microseconds), multiprocessing.Pool is still the right tool.
Isn't free-threaded Python still experimental?
"Experimental" describes the ecosystem catch-up, not the feature itself. The free-threaded build (3.14t) is a fully supported CPython release variant - it ships with the same test suite, same stability guarantees, and sys._is_gil_enabled() is stable API. The risk is with C extensions that aren't thread-safe yet; FastIter has no C extension dependencies, so that risk doesn't apply here.
PEP 703 is accepted and the GIL becomes more optional each release.
Contributing
We welcome contributions! See CONTRIBUTING.md for:
- Development setup
- Code style guidelines
- How to add new operations
- Testing requirements
License
MIT License - see LICENSE
Inspiration
- Rayon - Rust's data parallelism library
- PEP 703 - Making the GIL optional
- Tutorial - Implementing parallel iterators
Version
v0.1.0 - Experimental / locally tested
- 40 passing tests
- 2-5.6x measured speedups
- Complete documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastiter-0.1.2.tar.gz.
File metadata
- Download URL: fastiter-0.1.2.tar.gz
- Upload date:
- Size: 40.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af8726010c6efc86a5c61c8bf620771b7527b56a659f40523618685d4541c3e2
|
|
| MD5 |
30bb14532298e1001f48c28e756f9e42
|
|
| BLAKE2b-256 |
0128303df9eded6f89c810be69f8aeac18000c5544098dda88951642d50f20be
|
Provenance
The following attestation bundles were made for fastiter-0.1.2.tar.gz:
Publisher:
publish.yml on rohaquinlop/fastiter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastiter-0.1.2.tar.gz -
Subject digest:
af8726010c6efc86a5c61c8bf620771b7527b56a659f40523618685d4541c3e2 - Sigstore transparency entry: 983629166
- Sigstore integration time:
-
Permalink:
rohaquinlop/fastiter@668cd36cc183803edfa995805787918eeb306732 -
Branch / Tag:
refs/tags/0.1.2 - Owner: https://github.com/rohaquinlop
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@668cd36cc183803edfa995805787918eeb306732 -
Trigger Event:
release
-
Statement type:
File details
Details for the file fastiter-0.1.2-py3-none-any.whl.
File metadata
- Download URL: fastiter-0.1.2-py3-none-any.whl
- Upload date:
- Size: 18.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f46cab0cf80da09bfc4c73806638ce33c8efe266afbcd550c111f3503a48c72
|
|
| MD5 |
0efc27829920b44a7d3fd6a11f0f5998
|
|
| BLAKE2b-256 |
1a78012a62c4a1f2ca13173f71afeae6bdac1ace98857abfbcbe07a181d7046a
|
Provenance
The following attestation bundles were made for fastiter-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on rohaquinlop/fastiter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastiter-0.1.2-py3-none-any.whl -
Subject digest:
6f46cab0cf80da09bfc4c73806638ce33c8efe266afbcd550c111f3503a48c72 - Sigstore transparency entry: 983629178
- Sigstore integration time:
-
Permalink:
rohaquinlop/fastiter@668cd36cc183803edfa995805787918eeb306732 -
Branch / Tag:
refs/tags/0.1.2 - Owner: https://github.com/rohaquinlop
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@668cd36cc183803edfa995805787918eeb306732 -
Trigger Event:
release
-
Statement type: