Skip to main content

High-performance parallel dataframe and array processing with Arrow-backed storage

Project description

FrameX

FrameX is an Arrow-backed Python library for parallel dataframe and array processing on a single machine.

It combines:

  • Pandas-like tabular APIs (DataFrame, Series, GroupBy)
  • NumPy-compatible chunked arrays (NDArray with NumPy protocol support)
  • Arrow-native storage/interop (to_arrow, Parquet/IPC I/O)
  • Eager execution with optional lazy pipelines (.lazy().collect())
  • Runtime backends for local threads/processes plus optional Ray/Dask executors

Why FrameX

FrameX is aimed at local analytics workflows that are bigger than comfortable single-threaded scripts but do not yet require distributed infrastructure.

Typical fit:

  • ETL and analytics pipelines on medium-to-large local datasets
  • feature engineering workflows that mix table and array operations
  • migration paths from Pandas scripts where API familiarity matters

Installation

From PyPI:

pip install pyframe-xpy

From source:

git clone https://github.com/aeiwz/FrameX.git
cd FrameX
pip install -e .

Requirements:

  • Python >=3.10
  • Core dependencies: pyarrow, numpy
  • Optional compatibility: pandas (pip install pyframe-xpy[pandas_compat])

Quick Start

import framex as fx

df = fx.DataFrame(
    {
        "group": ["a", "a", "b"],
        "value": [10, 20, 30],
        "is_refund": [False, True, False],
    }
)

result = (
    df.filter(~df["is_refund"])
      .groupby("group")
      .agg({"value": ["sum", "mean", "count"]})
      .sort("value_sum", ascending=False)
)

print(result.to_pandas())

Core API

Top-level imports:

import framex as fx

Main objects and helpers:

  • fx.DataFrame, fx.Series, fx.Index, fx.LazyFrame
  • fx.NDArray, fx.array(...)
  • fx.read_parquet, fx.write_parquet, fx.read_ipc, fx.write_ipc, fx.read_csv, fx.write_csv
  • fx.read_json, fx.write_json, fx.read_ndjson, fx.write_ndjson
  • fx.read_file, fx.write_file for format auto-detection

Compression:

  • transparent extension-based compression for read_file / write_file
  • supported wrappers: .gz, .bz2, .xz, .zip, and .zst/.zstd (when zstandard is installed)
  • fx.from_pandas, fx.from_dask, fx.from_ray, fx.from_dataframe
  • fx.get_config, fx.set_backend, fx.set_workers, fx.set_serializer, fx.set_kernel_backend
  • fx.set_array_backend for auto/NumExpr/Numba/JAX/PyTorch/CuPy acceleration modes
  • fx.recommend_best_performance_config() to inspect hardware-tuned settings
  • fx.auto_configure_hardware() to apply best-performance config automatically
  • fx.StreamProcessor for micro-batch streaming pipelines

Acceleration extras:

pip install pyframe-xpy[accel]      # numexpr + numba
pip install pyframe-xpy[gpu]        # cupy (CUDA)
pip install pyframe-xpy[ml_accel]   # jax + pytorch
pip install pyframe-xpy[pandas_fast]  # modin backend
pip install pyframe-xpy[distributed]  # Dask + Ray distributed/HPC backends
pip install zstandard  # .zst/.zstd file compression

Backend notes:

  • fx.set_backend("threads" | "processes" | "ray" | "dask" | "hpc")
  • Ray and Dask execution backends require their respective runtimes to be installed/available.
  • HPC mode ("hpc") uses cluster-oriented execution via Dask or Ray:
    • FRAMEX_HPC_ENGINE=dask|ray
    • FRAMEX_DASK_SCHEDULER_ADDRESS=<tcp://...> to connect existing Dask clusters
    • FRAMEX_RAY_ADDRESS=<ray://...> to connect existing Ray clusters
    • optional SLURM bootstrap: FRAMEX_DASK_SLURM=1 (requires dask-jobqueue)

Test support notes:

  • Some tests are optional-backend gated and intentionally skipped when deps are not installed.
  • Typical skip reasons: missing dask.distributed, dask.dataframe, ray, or ray.data.
  • Run full optional matrix locally:
pip install pyframe-xpy[distributed]
pytest -q

Documentation

Canonical docs are in docs/documents:

Website (Docs UI)

The docs website lives in website (Next.js App Router).

Main docs routes:

  • http://localhost:3000/docs/features
  • http://localhost:3000/docs/tutorial_etl_pipeline
  • http://localhost:3000/docs/use_cases
  • http://localhost:3000/docs/configuration_guide
  • http://localhost:3000/docs/performance_test

Run locally:

cd website
npm install
npm run dev

Production build:

npm run build
npm run start

Development

Install dev dependencies:

pip install -e .[dev]

Run tests:

pytest

Benchmarks

Benchmark code and generated reports are in benchmarks.

Run the full benchmark suite (includes in-terminal progress bar and report generation):

python3 -m benchmarks.benchmark_suite

Run workload capability matrix checks:

python3 -m benchmarks.check_framex_workloads

Benchmark outputs are written to benchmarks/results:

  • benchmark_results.json
  • benchmark_results.csv
  • benchmark_report.md
  • framex_workload_check.json
  • performance_speedup.png
  • parallel_processing_scaling.png
  • multiprocessing_scaling.png
  • memory_peak_rss.png

Project Status

FrameX is pre-1.0 (0.1.0) and in active development.

  • APIs are usable and documented
  • compatibility/performance behavior will continue to evolve
  • pin versions for production-critical workloads

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyframe_xpy-0.1.0.tar.gz (56.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyframe_xpy-0.1.0-py3-none-any.whl (67.8 kB view details)

Uploaded Python 3

File details

Details for the file pyframe_xpy-0.1.0.tar.gz.

File metadata

  • Download URL: pyframe_xpy-0.1.0.tar.gz
  • Upload date:
  • Size: 56.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyframe_xpy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2034cac2bdfbd1aab423f190aff45b428b18df27e4a218a31c4701fe8546b7b4
MD5 ef1745150dce6b0cbe8fabfcdfb045d9
BLAKE2b-256 53592d1b28c0130e052e1b825c28c94d0096600fb956e884e43ab68e95eb8fa1

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyframe_xpy-0.1.0.tar.gz:

Publisher: publish-pypi.yml on aeiwz/FrameX

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyframe_xpy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pyframe_xpy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 67.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyframe_xpy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cba88fbc27156dd6e85246622c0a1afe552c65e3419b50c7a08605674cd1f435
MD5 e898cad36b2873b6c89b2dc632328ccd
BLAKE2b-256 909a939683abeab57ee4407f6fac4f73f2d3f0c51377ffd129a438deaed9c0ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyframe_xpy-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on aeiwz/FrameX

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page