Bridge between pandas, cudf, modin, dask, dask-modin, dask-cudf, spark or spark+rapids and between numpy, cupy and dask.array
Project description
Virtual DataFrame
Motivation
With Panda-like dataframe or numby-like array, do you want to create a code, and choose at the end, the framework to use? Do you want to be able to choose the best framework after simply performing performance measurements? This framework unifies multiple Panda-compatible or Numpy-comptaible components, to allow the writing of a single code, compatible with all.
Do you want to use different architectures at different times of the year to be "green" and cheaper? Do you want to use a GPU only for the black-friday?
Synopsis
With some parameters and Virtual classes, it's possible to write a code, and execute this code:
- With or without multicore
- With or without cluster (multi nodes)
- With or without GPU
To do that, we create some virtual classes, add some methods in others classes, etc.
It's difficult to use a combinaison of framework, with the same classe name, with similare semantic, etc. For example, if you want to use in the same program, Dask, cudf, pandas, modin, pyspark or pyspark+rapids, you must manage:
pandas.DataFrame
,pandas,Series
modin.pandas.DataFrame
,modin.pandas.Series
cudf.DataFrame
,cudf.Series
dask.DataFrame
,dask.Series
pyspark.pandas.DataFrame
,pyspark.pandas.Series
With numpy, you must manage:
numpy.ndarray
cupy.ndarray
dask.array
With cudf
or cudf
, the code must call .to_pandas()
or asnumpy()
. With dask, the code must call .compute()
, can use @delayed
or
dask.distributed.Client
. etc.
We propose to replace all these classes and scenarios, with a uniform model, inspired by dask (the more complex API). Then, it is possible to write one code, and use it in differents environnements and frameworks.
This project is essentially a back-port of Dask+Cudf to others frameworks. We try to normalize the API of all frameworks. This project will weave your code with the selected framework, at runtime.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file mx07-0.2.dev0-py3-none-any.whl
.
File metadata
- Download URL: mx07-0.2.dev0-py3-none-any.whl
- Upload date:
- Size: 26.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5683c86c093cfa8cd136d40b5db5b0fe93fd91fe793d51b809b43b09d0353187 |
|
MD5 | 300bc9fd05cc1a5079e37d34906c36b3 |
|
BLAKE2b-256 | 399a27aed6b0ae2d1bb89c03de57ecd4b0f0d5def781d2b8e94682dba04e4a5f |