Implementing the bare necessities of Pandas with the lazy evaluating and optimizing Weld framework.

Project description

Baloo

Implementing the bare necessities of Pandas with the lazy evaluating and optimizing Weld framework.

Documentation here

Install

pip install baloo

Note that currently it has only been tested on Python 3.5.2, though any Python 3 version should be fine.

Benchmarks

Benchmark results over seeded randomized data are shown below. The data consists of 4 NumPy array columns: 2 of float64, 1 of int64, and 1 of int32. First 2 plots run the following operations over 56MB and 420MB total data, respectively:

df = df[(df['col1'] > 0) & (df['col2'] >= 10) & (df['col3'] < 30)]              # filter                                                   
df = df.agg(['min', 'prod', 'mean', 'std'])                                     # 4x agg
df['col4'] = df['col1'] * 2 + 1 - 23                                            # 3x op
df['col5'] = df['col1'].apply(np.exp)                                           # udf
df = df.groupby(['col2', 'col4']).var()                                         # groupby*
df = df[['col3', 'col1']].join(df[['col3', 'col2']], on='col3', rsuffix='_r')   # join*

* Note that the groupby and join implementations are simplified in Baloo. For instance, the groupby result is not sorted in Baloo as is in Pandas. The join implementation in Baloo currently relies on the on data being sorted and distinct; sortednes is expected to be patched soon.

benchmark results

This last graph shows the execution time of 3x op over varying dataset sizes:

benchmark scalability

Weld is, indeed, expected to scale well due to features such as vectorization, however the compilation time outweighs the improved computation time for small datasets. Nevertheless, Baloo currently only supports a limited subset of Pandas. More work coming soon!

The scripts used to run the benchmarks are available in the relevant folder.

Project details

Release history Release notifications | RSS feed

This version

0.0.5

Jan 13, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

baloo-0.0.5-py3-none-any.whl (10.8 MB view details)

Uploaded Jan 13, 2019 Python 3

File details

Details for the file baloo-0.0.5-py3-none-any.whl.

File metadata

Download URL: baloo-0.0.5-py3-none-any.whl
Upload date: Jan 13, 2019
Size: 10.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.2

File hashes

Hashes for baloo-0.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`437e2ace18984432e80505f46045e7a39ea31176e90a0bc730fb57636c53a91f`
MD5	`a4ab982a2879577845be0ae802206ed2`
BLAKE2b-256	`a9e3819c3deda433c443919d4bd9bc389435088f9d3997592493ce7255bf412e`

See more details on using hashes here.

baloo 0.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta