Skip to main content

High-performance parallel dataframe and array processing with Arrow-backed storage

Project description

FrameX

FrameX is an Arrow-backed Python library for parallel dataframe and array processing on a single machine.

It combines:

  • Pandas-like tabular APIs (DataFrame, Series, GroupBy)
  • NumPy-compatible chunked arrays (NDArray with NumPy protocol support)
  • Arrow-native storage/interop (to_arrow, Parquet/IPC I/O)
  • Eager execution with optional lazy pipelines (.lazy().collect())
  • Runtime backends for local threads/processes plus optional Ray/Dask executors

Why FrameX

FrameX is aimed at local analytics workflows that are bigger than comfortable single-threaded scripts but do not yet require distributed infrastructure.

Typical fit:

  • ETL and analytics pipelines on medium-to-large local datasets
  • feature engineering workflows that mix table and array operations
  • migration paths from Pandas scripts where API familiarity matters

Installation

From PyPI:

pip install pyframe-xpy

From source:

git clone https://github.com/aeiwz/FrameX.git
cd FrameX
pip install -e .

Requirements:

  • Python >=3.10
  • Core dependencies: pyarrow, numpy
  • Optional compatibility: pandas (pip install pyframe-xpy[pandas_compat])

Quick Start

import framex as fx

df = fx.DataFrame(
    {
        "group": ["a", "a", "b"],
        "value": [10, 20, 30],
        "is_refund": [False, True, False],
    }
)

result = (
    df.filter(~df["is_refund"])
      .groupby("group")
      .agg({"value": ["sum", "mean", "count"]})
      .sort("value_sum", ascending=False)
)

print(result.to_pandas())

Core API

Top-level imports:

import framex as fx

Main objects and helpers:

  • fx.DataFrame, fx.Series, fx.Index, fx.LazyFrame
  • fx.NDArray, fx.array(...)
  • fx.read_parquet, fx.write_parquet, fx.read_ipc, fx.write_ipc, fx.read_csv, fx.write_csv
  • fx.read_json, fx.write_json, fx.read_ndjson, fx.write_ndjson
  • fx.read_file, fx.write_file for format auto-detection

Compression:

  • transparent extension-based compression for read_file / write_file
  • supported wrappers: .gz, .bz2, .xz, .zip, and .zst/.zstd (when zstandard is installed)
  • fx.from_pandas, fx.from_dask, fx.from_ray, fx.from_dataframe
  • fx.get_config, fx.set_backend, fx.set_workers, fx.set_serializer, fx.set_kernel_backend
  • fx.set_array_backend for auto/NumExpr/Numba/JAX/PyTorch/CuPy acceleration modes
  • fx.recommend_best_performance_config() to inspect hardware-tuned settings
  • fx.auto_configure_hardware() to apply best-performance config automatically
  • fx.StreamProcessor for micro-batch streaming pipelines

Acceleration extras:

pip install pyframe-xpy[accel]      # numexpr + numba
pip install pyframe-xpy[gpu]        # cupy (CUDA)
pip install pyframe-xpy[ml_accel]   # jax + pytorch
pip install pyframe-xpy[pandas_fast]  # modin backend
pip install pyframe-xpy[distributed]  # Dask + Ray distributed/HPC backends
pip install zstandard  # .zst/.zstd file compression

Backend notes:

  • fx.set_backend("threads" | "processes" | "ray" | "dask" | "hpc")
  • Ray and Dask execution backends require their respective runtimes to be installed/available.
  • HPC mode ("hpc") uses cluster-oriented execution via Dask or Ray:
    • FRAMEX_HPC_ENGINE=dask|ray
    • FRAMEX_DASK_SCHEDULER_ADDRESS=<tcp://...> to connect existing Dask clusters
    • FRAMEX_RAY_ADDRESS=<ray://...> to connect existing Ray clusters
    • optional SLURM bootstrap: FRAMEX_DASK_SLURM=1 (requires dask-jobqueue)

Test support notes:

  • Some tests are optional-backend gated and intentionally skipped when deps are not installed.
  • Typical skip reasons: missing dask.distributed, dask.dataframe, ray, or ray.data.
  • Run full optional matrix locally:
pip install pyframe-xpy[distributed]
pytest -q

Documentation

Canonical docs are in docs/documents:

Website (Docs UI)

The docs website lives in website (Next.js App Router).

Main docs routes:

  • http://localhost:3000/docs/features
  • http://localhost:3000/docs/tutorial_etl_pipeline
  • http://localhost:3000/docs/use_cases
  • http://localhost:3000/docs/configuration_guide
  • http://localhost:3000/docs/performance_test

Run locally:

cd website
npm install
npm run dev

Production build:

npm run build
npm run start

Development

Install dev dependencies:

pip install -e .[dev]

Run tests:

pytest

Benchmarks

Benchmark code and generated reports are in benchmarks.

Run the full benchmark suite (includes in-terminal progress bar and report generation):

python3 -m benchmarks.benchmark_suite

Run workload capability matrix checks:

python3 -m benchmarks.check_framex_workloads

Benchmark outputs are written to benchmarks/results:

  • benchmark_results.json
  • benchmark_results.csv
  • benchmark_report.md
  • framex_workload_check.json
  • performance_speedup.png
  • parallel_processing_scaling.png
  • multiprocessing_scaling.png
  • memory_peak_rss.png

Project Status

FrameX is pre-1.0 (0.1.1) and in active development.

  • APIs are usable and documented
  • compatibility/performance behavior will continue to evolve
  • pin versions for production-critical workloads

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyframe_xpy-0.1.1.tar.gz (56.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyframe_xpy-0.1.1-py3-none-any.whl (68.0 kB view details)

Uploaded Python 3

File details

Details for the file pyframe_xpy-0.1.1.tar.gz.

File metadata

  • Download URL: pyframe_xpy-0.1.1.tar.gz
  • Upload date:
  • Size: 56.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyframe_xpy-0.1.1.tar.gz
Algorithm Hash digest
SHA256 085c23f6ac4753ae0359acaf7fc9f8d61dea2cc8b67382eb8bd9a6161e1378dc
MD5 21a360dae1e4d28d0541d97de1d15630
BLAKE2b-256 4086dc3c85ad1f4e100c2ad8591acaec0d1d3207db37a01b23773c5eac0c273d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyframe_xpy-0.1.1.tar.gz:

Publisher: publish-pypi.yml on aeiwz/FrameX

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyframe_xpy-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pyframe_xpy-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 68.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyframe_xpy-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dddaadf2eab943b79e7d6f1e5bc17a911c1a4fede85f8ba3aa149c5c6f7495cf
MD5 40f23af9a8c541556bd79fd3d0f41d21
BLAKE2b-256 8d7f11356b97d0ef085635d20dbeead5024a6fa967a7a1d486afd94270dd4519

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyframe_xpy-0.1.1-py3-none-any.whl:

Publisher: publish-pypi.yml on aeiwz/FrameX

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page