Skip to main content

pylibcudf - Python bindings for libcudf

Project description

 cuDF - A GPU-accelerated DataFrame library for tabular data processing

cuDF (pronounced "KOO-dee-eff") is an Apache 2.0 licensed, GPU-accelerated DataFrame library for tabular data processing. The cuDF library is one part of the RAPIDS GPU Accelerated Data Science suite of libraries.

About

cuDF is composed of multiple libraries including:

  • libcudf: A CUDA C++ library with Apache Arrow compliant data structures and fundamental algorithms for tabular data.
  • pylibcudf: A Python library providing Cython bindings for libcudf.
  • cudf: A Python library providing
    • A DataFrame library mirroring the pandas API
    • A zero-code change accelerator, cudf.pandas, for existing pandas code.
  • cudf-polars: A Python library providing a GPU engine for Polars
  • dask-cudf: A Python library providing a GPU backend for Dask DataFrames

Notable projects that use cuDF include:

Installation

System Requirements

Operating System, GPU driver, and supported CUDA version information can be found at the RAPIDS Installation Guide

pip

A stable release of each cudf library is available on PyPI. You will need to match the major version number of your installed CUDA version with a -cu## suffix when installing from PyPI.

A development version of each library is available as a nightly release by including the -i https://pypi.anaconda.org/rapidsai-wheels-nightly/simple index.

# CUDA 13
pip install libcudf-cu13
pip install pylibcudf-cu13
pip install cudf-cu13
pip install cudf-polars-cu13
pip install dask-cudf-cu13

# CUDA 12
pip install libcudf-cu12
pip install pylibcudf-cu12
pip install cudf-cu12
pip install cudf-polars-cu12
pip install dask-cudf-cu12

conda

A stable release of each cudf library is available to be installed with the conda package manager by specifying the -c rapidsai channel.

A development version of each library is available as a nightly release by specifying the -c rapidsai-nightly channel instead.

conda install -c rapidsai libcudf
conda install -c rapidsai pylibcudf
conda install -c rapidsai cudf
conda install -c rapidsai cudf-polars
conda install -c rapidsai dask-cudf

source

To install cuDF from source, please follow the contribution guide detailing how to setup the build environment.

Examples

The following examples showcase reading a parquet file, dropping missing rows with a null value, and performing a groupby aggregation on the data.

cudf

import cudf and the APIs are largely similar to pandas.

import cudf

df = cudf.read_parquet("data.parquet")
df.dropna().groupby(["A", "B"]).mean()

cudf.pandas

With a Python file containing pandas code:

import pandas as pd

df = pd.read_parquet("data.parquet")
df.dropna().groupby(["A", "B"]).mean()

Use cudf.pandas by invoking python with -m cudf.pandas

$ python -m cudf.pandas script.py

If running the pandas code in an interactive Jupyter environment, call %load_ext cudf.pandas before importing pandas.

In [1]: %load_ext cudf.pandas

In [2]: import pandas as pd

In [3]: df = pd.read_parquet("data.parquet")

In [4]: df.dropna().groupby(["A", "B"]).mean()

cudf-polars

Using Polars' lazy API, call collect with engine="gpu" to run the operation on the GPU

import polars as pl

lf = pl.scan_parquet("data.parquet")
lf.drop_nulls().group_by(["A", "B"]).mean().collect(engine="gpu")

Questions and Discussion

For bug reports or feature requests, please file an issue on the GitHub issue tracker.

For questions or discussion about cuDF and GPU data processing, feel free to post in the RAPIDS Slack workspace.

Contributing

cuDF is open to contributions from the community! Please see our guide for contributing to cuDF for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pylibcudf_cu12-26.4.0-cp311-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (8.6 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

pylibcudf_cu12-26.4.0-cp311-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (8.0 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file pylibcudf_cu12-26.4.0-cp311-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pylibcudf_cu12-26.4.0-cp311-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 89d3d4e27a536e2ed5e409c96875e0d7e9dbad96b15f5f3df5db6ab94bfd22ca
MD5 fc5e5e6f83d2cc0a6b5d6e6da3ed62b2
BLAKE2b-256 30e532018ad81cdb8748b02bc71c0a5289ba0f3943a28820c530da09cf240b7d

See more details on using hashes here.

File details

Details for the file pylibcudf_cu12-26.4.0-cp311-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for pylibcudf_cu12-26.4.0-cp311-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 1d0b191096b574cff4d709bb0880d2f0a346120493090cc8103098de9abead71
MD5 af773ea3936789a5c3ae8b98f2495e8c
BLAKE2b-256 8badd2a1961ae8b194ce33a9d8269dd046629fb556147aca74223cabc632a07b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page