Fugue, Rapids, BlazingSQL integration
Project description
Fugue, Rapids, BlazingSQL integration
This project extends Fugue to support Rapids cuDF and BlazingSQL.
Installation
You need to install Rapids and BlazingSQL by yourself (see official instructions), and assume you installed them by conda, then you need to pip install in the same environment
conda run -n <your_env> pip install fugue-blazing
How To Use
As a standard Fugue extension, you can use in two ways: functional APIs and Fugue SQL. But Fugue SQL is the preferred way for this extension. This is because due to the special design of GPU, code to run on GPU has special requirement. Currently transform is leveraging NativeExecutionEngine which is using CPU. Other than transform, Fugue fully relies on cuDF and BlasingSQL to do the compute.
Practically, if you don't use transform, then SQL may be the better choice to express your data pipelines.
Functional APIs
Here is an example Fugue code snippet that illustrates some of the key features of the framework. A fillna function creates a new column named filled
, which is the same as the column value
except that the None
values are filled.
from fugue import FugueWorkflow
from fugue_blazing import CudaExecutionEngine, setup_shortcuts
# Creating sample data
data = [
["A", "2020-01-01", 10],
["A", "2020-01-02", None],
["A", "2020-01-03", 30],
["B", "2020-01-01", 20],
["B", "2020-01-02", None],
["B", "2020-01-03", 40]
]
schema = "id:str,date:date,value:double"
dag = FugueWorkflow()
dag.df(data, schema).partition_by("id", presort="date").take(1).show()
dag.run(CudaExecutionEngine)
# call setup_shortcuts to make your code more expressive
setup_shortcuts()
dag.run("blazing")
You can also run SQL using functional API:
from fugue import FugueWorkflow
from fugue_blazing import setup_shortcuts
setup_shortcuts()
data = [
["A", "2020-01-01", 10],
["A", "2020-01-02", None],
["A", "2020-01-03", 30],
["B", "2020-01-01", 20],
["B", "2020-01-02", None],
["B", "2020-01-03", 40]
]
schema = "id:str,date:date,value:double"
with FugueWorkflow("blazing") as dag:
df = dag.df(data, schema)
dag.select("* from ",df," where value>20").show()
For detailed examples, please read Fugue Tutorials
Fugue SQL
Programmatical Approach
from fugue_sql import fsql
from fugue_blazing import setup_shortcuts
import pandas as pd
import cudf
setup_shortcuts()
pdf = pd.DataFrame([
["A", "2020-01-01", 10],
["A", "2020-01-02", None],
["A", "2020-01-03", 30],
["B", "2020-01-01", 20],
["B", "2020-01-02", None],
["B", "2020-01-03", 40]
], columns = ["id", "date", "value"])
result = fsql("""
TAKE 1 ROW FROM df PREPARTITION BY id PRESORT date
YIELD DATAFRAME AS x
""", df=pdf).run("blazing")
# this is how you get outputs from Fugue SQL
assert isinstance(result["x"].native, cudf.DataFrame)
fsql("""
SELECT * FROM best WHERE id='A'
PRINT
SELECT id, COUNT(*) AS ct FROM orig GROUP BY id
PRINT
""", best=result["x"], orig=pdf).run("blazing")
Jupyter Notebook
Before running Jupyter, you need to firstly install fugue and notebook extension
pip install fugue
jupyter nbextension install --sys-prefix --symlink --py fugue_notebook
jupyter nbextension enable --py fugue_notebook
In cell 1
%load_ext fugue_notebook
from fugue_blazing import setup_shortcuts
setup_shortcuts()
pdf = pd.DataFrame([
["A", "2020-01-01", 10],
["A", "2020-01-02", None],
["A", "2020-01-03", 30],
["B", "2020-01-01", 20],
["B", "2020-01-02", None],
["B", "2020-01-03", 40]
], columns = ["id", "date", "value"])
In cell 2
%%fsql blazing
TAKE 1 ROW FROM df PREPARTITION BY id PRESORT date
YIELD DATAFRAME AS x
In cell 3
%%fsql blazing
SELECT * FROM x WHERE id='A'
PRINT
SELECT id, COUNT(*) AS ct FROM pdf GROUP BY id
PRINT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fugue-blazing-0.0.3.tar.gz
.
File metadata
- Download URL: fugue-blazing-0.0.3.tar.gz
- Upload date:
- Size: 22.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bdbaf502df8e5ec81ba039cb70041fbce2a09c45e4e6fc8a90dbbd8f2075b8b3 |
|
MD5 | 34427634f887125b96cd6aa67465d7f6 |
|
BLAKE2b-256 | dd93a9fb912c0810762f28559c2e00df25eaa988b4d79d1ee02aad5bd082b853 |
File details
Details for the file fugue_blazing-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: fugue_blazing-0.0.3-py3-none-any.whl
- Upload date:
- Size: 23.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 054cab6a1165b1c367a94c52b3354fad5fb04f1fbc8e7fcf968d7c50d727efaa |
|
MD5 | 8ba5dd21489c7fb6bbff8cd8629c0dea |
|
BLAKE2b-256 | 5fcfc95f004fbac7d4ccb812451df9b621f68701c4fcb9d3f0781a54b7c28832 |