Skip to main content

Bridge between pandas, cudf, modin, dask, dask-modin, dask-cudf, spark or spark+rapids and between numpy, cupy and dask.array

Project description

Virtual DataFrame

Full documentation

Motivation

With Panda-like dataframe or numby-like array, do you want to create a code, and choose at the end, the framework to use? Do you want to be able to choose the best framework after simply performing performance measurements? This framework unifies multiple Panda-compatible or Numpy-comptaible components, to allow the writing of a single code, compatible with all.

Do you want to use different architectures at different times of the year to be "green" and cheaper? Do you want to use a GPU only for the black-friday?

Synopsis

With some parameters and Virtual classes, it's possible to write a code, and execute this code:

  • With or without multicore
  • With or without cluster (multi nodes)
  • With or without GPU

To do that, we create some virtual classes, add some methods in others classes, etc.

It's difficult to use a combinaison of framework, with the same classe name, with similare semantic, etc. For example, if you want to use in the same program, Dask, cudf, pandas, modin, pyspark or pyspark+rapids, you must manage:

  • pandas.DataFrame, pandas,Series
  • modin.pandas.DataFrame, modin.pandas.Series
  • cudf.DataFrame, cudf.Series
  • dask.DataFrame, dask.Series
  • pyspark.pandas.DataFrame, pyspark.pandas.Series

With numpy, you must manage:

  • numpy.ndarray
  • cupy.ndarray
  • dask.array

With cudf or cudf, the code must call .to_pandas() or asnumpy(). With dask, the code must call .compute(), can use @delayed or dask.distributed.Client. etc.

We propose to replace all these classes and scenarios, with a uniform model, inspired by dask (the more complex API). Then, it is possible to write one code, and use it in differents environnements and frameworks.

This project is essentially a back-port of Dask+Cudf to others frameworks. We try to normalize the API of all frameworks. This project will weave your code with the selected framework, at runtime.

Binder

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

mx07-0.2.dev0-py3-none-any.whl (26.6 kB view details)

Uploaded Python 3

File details

Details for the file mx07-0.2.dev0-py3-none-any.whl.

File metadata

  • Download URL: mx07-0.2.dev0-py3-none-any.whl
  • Upload date:
  • Size: 26.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for mx07-0.2.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 5683c86c093cfa8cd136d40b5db5b0fe93fd91fe793d51b809b43b09d0353187
MD5 300bc9fd05cc1a5079e37d34906c36b3
BLAKE2b-256 399a27aed6b0ae2d1bb89c03de57ecd4b0f0d5def781d2b8e94682dba04e4a5f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page