High-performance parallel dataframe and array processing with Arrow-backed storage
Project description
FrameX
FrameX is an Arrow-backed Python library for parallel dataframe and array processing on a single machine.
It combines:
- Pandas-like tabular APIs (
DataFrame,Series,GroupBy) - NumPy-compatible chunked arrays (
NDArraywith NumPy protocol support) - Arrow-native storage/interop (
to_arrow, Parquet/IPC I/O) - Eager execution with optional lazy pipelines (
.lazy().collect()) - Runtime backends for local threads/processes plus optional Ray/Dask executors
Why FrameX
FrameX is aimed at local analytics workflows that are bigger than comfortable single-threaded scripts but do not yet require distributed infrastructure.
Typical fit:
- ETL and analytics pipelines on medium-to-large local datasets
- feature engineering workflows that mix table and array operations
- migration paths from Pandas scripts where API familiarity matters
Installation
From PyPI:
pip install pyframe-xpy
From source:
git clone https://github.com/aeiwz/FrameX.git
cd FrameX
pip install -e .
Requirements:
- Python
>=3.10 - Core dependencies:
pyarrow,numpy - Optional compatibility:
pandas(pip install pyframe-xpy[pandas_compat])
Quick Start
import framex as fx
df = fx.DataFrame(
{
"group": ["a", "a", "b"],
"value": [10, 20, 30],
"is_refund": [False, True, False],
}
)
result = (
df.filter(~df["is_refund"])
.groupby("group")
.agg({"value": ["sum", "mean", "count"]})
.sort("value_sum", ascending=False)
)
print(result.to_pandas())
Core API
Top-level imports:
import framex as fx
Main objects and helpers:
fx.DataFrame,fx.Series,fx.Index,fx.LazyFramefx.NDArray,fx.array(...)fx.read_parquet,fx.write_parquet,fx.read_ipc,fx.write_ipc,fx.read_csv,fx.write_csvfx.read_json,fx.write_json,fx.read_ndjson,fx.write_ndjsonfx.read_file,fx.write_filefor format auto-detection
Compression:
- transparent extension-based compression for
read_file/write_file - supported wrappers:
.gz,.bz2,.xz,.zip, and.zst/.zstd(whenzstandardis installed) fx.from_pandas,fx.from_dask,fx.from_ray,fx.from_dataframefx.get_config,fx.set_backend,fx.set_workers,fx.set_serializer,fx.set_kernel_backendfx.set_array_backendfor auto/NumExpr/Numba/JAX/PyTorch/CuPy acceleration modesfx.recommend_best_performance_config()to inspect hardware-tuned settingsfx.auto_configure_hardware()to apply best-performance config automaticallyfx.StreamProcessorfor micro-batch streaming pipelines
Acceleration extras:
pip install pyframe-xpy[accel] # numexpr + numba
pip install pyframe-xpy[gpu] # cupy (CUDA)
pip install pyframe-xpy[ml_accel] # jax + pytorch
pip install pyframe-xpy[pandas_fast] # modin backend
pip install pyframe-xpy[distributed] # Dask + Ray distributed/HPC backends
pip install zstandard # .zst/.zstd file compression
Backend notes:
fx.set_backend("threads" | "processes" | "ray" | "dask" | "hpc")- Ray and Dask execution backends require their respective runtimes to be installed/available.
- HPC mode (
"hpc") uses cluster-oriented execution via Dask or Ray:FRAMEX_HPC_ENGINE=dask|rayFRAMEX_DASK_SCHEDULER_ADDRESS=<tcp://...>to connect existing Dask clustersFRAMEX_RAY_ADDRESS=<ray://...>to connect existing Ray clusters- optional SLURM bootstrap:
FRAMEX_DASK_SLURM=1(requiresdask-jobqueue)
Test support notes:
- Some tests are optional-backend gated and intentionally
skippedwhen deps are not installed. - Typical skip reasons: missing
dask.distributed,dask.dataframe,ray, orray.data. - Run full optional matrix locally:
pip install pyframe-xpy[distributed]
pytest -q
Documentation
Canonical docs are in docs/documents:
- Overview
- Features
- Getting Started
- Installation
- Tutorial: ETL Pipeline
- Tutorial: NumPy NDArray Interop
- Use Cases
- Configuration Guide
- Performance Test
- Architecture
- API Reference
- Roadmap
- FAQ
Website (Docs UI)
The docs website lives in website (Next.js App Router).
Main docs routes:
http://localhost:3000/docs/featureshttp://localhost:3000/docs/tutorial_etl_pipelinehttp://localhost:3000/docs/use_caseshttp://localhost:3000/docs/configuration_guidehttp://localhost:3000/docs/performance_test
Run locally:
cd website
npm install
npm run dev
Production build:
npm run build
npm run start
Development
Install dev dependencies:
pip install -e .[dev]
Run tests:
pytest
Benchmarks
Benchmark code and generated reports are in benchmarks.
Run the full benchmark suite (includes in-terminal progress bar and report generation):
python3 -m benchmarks.benchmark_suite
Run workload capability matrix checks:
python3 -m benchmarks.check_framex_workloads
Benchmark outputs are written to benchmarks/results:
benchmark_results.jsonbenchmark_results.csvbenchmark_report.mdframex_workload_check.jsonperformance_speedup.pngparallel_processing_scaling.pngmultiprocessing_scaling.pngmemory_peak_rss.png
Project Status
FrameX is pre-1.0 (0.1.2) and in active development.
- APIs are usable and documented
- compatibility/performance behavior will continue to evolve
- pin versions for production-critical workloads
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyframe_xpy-0.1.2.tar.gz.
File metadata
- Download URL: pyframe_xpy-0.1.2.tar.gz
- Upload date:
- Size: 57.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a39383bf3bceb48b7366079acd4802aab77924a201c8a07a5b95c8b4f0b26d5
|
|
| MD5 |
02e994e9fb489979e364f3f21806bbc8
|
|
| BLAKE2b-256 |
e8a8c984d6bf2c81dd59d70b3a5c0c2fd56ad7ea9efabb77db641312c14b44dd
|
Provenance
The following attestation bundles were made for pyframe_xpy-0.1.2.tar.gz:
Publisher:
publish-pypi.yml on aeiwz/FrameX
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyframe_xpy-0.1.2.tar.gz -
Subject digest:
3a39383bf3bceb48b7366079acd4802aab77924a201c8a07a5b95c8b4f0b26d5 - Sigstore transparency entry: 1262259834
- Sigstore integration time:
-
Permalink:
aeiwz/FrameX@676c73c2ad1e5c5b78bf8f1fa195294dd4e58daa -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/aeiwz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@676c73c2ad1e5c5b78bf8f1fa195294dd4e58daa -
Trigger Event:
release
-
Statement type:
File details
Details for the file pyframe_xpy-0.1.2-py3-none-any.whl.
File metadata
- Download URL: pyframe_xpy-0.1.2-py3-none-any.whl
- Upload date:
- Size: 69.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01d9b6df753f257fec76b12647a8272b9e5c51ec2dea2ba5aa81e84db94fe33d
|
|
| MD5 |
e8e9ad94f54c8cb97821a2ffac87ea70
|
|
| BLAKE2b-256 |
32f415f04f2b8aeaee99afbfb59816350013cf26e02162ca912efce9a05a2e85
|
Provenance
The following attestation bundles were made for pyframe_xpy-0.1.2-py3-none-any.whl:
Publisher:
publish-pypi.yml on aeiwz/FrameX
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyframe_xpy-0.1.2-py3-none-any.whl -
Subject digest:
01d9b6df753f257fec76b12647a8272b9e5c51ec2dea2ba5aa81e84db94fe33d - Sigstore transparency entry: 1262259866
- Sigstore integration time:
-
Permalink:
aeiwz/FrameX@676c73c2ad1e5c5b78bf8f1fa195294dd4e58daa -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/aeiwz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@676c73c2ad1e5c5b78bf8f1fa195294dd4e58daa -
Trigger Event:
release
-
Statement type: