Skip to main content

A Distributed DataFrame library for large scale complex data processing.

Project description

daft

Welcome to Daft

Daft is a fast, ergonomic and scalable open-source dataframe library: built for Python and Complex Data/Machine Learning workloads.

Frame 113

Installation

Install Daft with pip install getdaft.

Documentation

Learn more about Daft in our documentation.

Community

For questions about Daft, please post in our community hosted on GitHub Discussions. We look forward to meeting you there!

Why Daft?

Processing Complex Data such as images/audio/pointclouds often requires accelerated compute for geometric or machine learning algorithms, much of which leverages existing tooling from the Python/C++ ecosystem. However, many workloads such as analytics, model training data curation and data processing often also require relational query operations for loading/filtering/joining/aggregations.

Daft marries the two worlds with a Dataframe API, enabling you to run both large analytical queries and powerful Complex Data algorithms from the same interface.

  1. Python-first: Python and Jupyter notebooks are first-class citizens. Daft handles any Python libraries and datastructures natively - use any Python library such as Numpy, OpenCV and PyTorch for Complex Data processing.

  2. Laptop to Cloud: Daft is built to run as easily on your laptop for interactive development and on your own Ray cluster or Eventual deployment for terabyte-scale production workloads.

  3. Open Data Formats: Daft loads from and writes to open data formats such as Apache Parquet and Apache Iceberg. It also supports all major cloud vendors' object storage options, allowing you to easily integrate with your existing storage solutions.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

getdaft-0.0.13.tar.gz (141.4 kB view hashes)

Uploaded Source

Built Distributions

getdaft-0.0.13-cp310-cp310-manylinux_2_17_x86_64.whl (1.7 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

getdaft-0.0.13-cp310-cp310-macosx_11_0_x86_64.whl (285.2 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ x86-64

getdaft-0.0.13-cp310-cp310-macosx_11_0_arm64.whl (272.0 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

getdaft-0.0.13-cp39-cp39-manylinux_2_17_x86_64.whl (1.7 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

getdaft-0.0.13-cp39-cp39-macosx_11_0_x86_64.whl (285.7 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ x86-64

getdaft-0.0.13-cp39-cp39-macosx_11_0_arm64.whl (272.4 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

getdaft-0.0.13-cp38-cp38-manylinux_2_17_x86_64.whl (1.7 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

getdaft-0.0.13-cp38-cp38-macosx_11_0_arm64.whl (272.2 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

getdaft-0.0.13-cp38-cp38-macosx_10_16_x86_64.whl (285.7 kB view hashes)

Uploaded CPython 3.8 macOS 10.16+ x86-64

getdaft-0.0.13-cp37-cp37m-manylinux_2_17_x86_64.whl (1.7 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

getdaft-0.0.13-cp37-cp37m-macosx_10_16_x86_64.whl (285.1 kB view hashes)

Uploaded CPython 3.7m macOS 10.16+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page