Skip to main content

Dynamic rewriting of pandas code

Project description

What is Dias?

Dias is an automatic rewriter of pandas code for Jupyter (IPython) notebooks. It rewrites pandas code to semantically equivalent but faster versions, on-the-fly, transparently and correctly. Dias is extremely lightweight and it will incur virtually no extra runtime or memory overheads. At the same time, Dias can provide 100x or even 1000x speedups (see example below).

Dias identifies rewrite opportunities automatically and leaves the rest of the code untouched, so you do not have to change a single line of your pandas code to use it.

Quick Start

Quickstart Colab Notebook

The fastest way to get started is to play around with our Quickstart Google Colab notebook. Otherwise, you can follow the documentation here to experiment locally.

Vanilla Pandas

With Dias

import pandas as pd
import numpy as np
rand_arr = np.random.rand(2_500_000,20)
df = pd.DataFrame(rand_arr)
%%time
def weighted_rating(x, m=50, C=5.6):
    v = x[0]
    R = x[9]
    return (v/(v+m) * R) + (m/(m+v) * C)
df.apply(weighted_rating, axis=1)
import pandas as pd
# Import Dias. Keep everything 
# else the same.
import dias.rewriter
import numpy as np
rand_arr = np.random.rand(2_500_000,20)
df = pd.DataFrame(rand_arr)
%%time
def weighted_rating(x, m=50, C=5.6):
    v = x[0]
    R = x[9]
    return (v/(v+m) * R) + (m/(m+v) * C)
df.apply(weighted_rating, axis=1)
Original: 10.3s Rewritten: 48.4ms

Speedup: 212x


Installation

pip install dias

Usage

Make sure that you are using a Jupyter/IPython notebook.

First import the package... That's it!

import dias.rewriter

Examples

Our Quickstart notebook contains many examples in a single place. You can also see our examples directory which lists self-contained examples that showcase different use cases Dias.

FAQ

How lightweight is Dias?

Dias is extremely lightweight. In terms of memory overheads, anything that runs with vanilla pandas, runs with Dias enabled too. Dias is just a code rewriter, so it does not alter the way pandas stores data and its internal state is minimal.

Dias' runtime overheads are minimal too. In our experiments, the maximum overhead of Dias is 23ms. You may also want to take a look at this example, where even though the original cell is quick, it is still worth using Dias.

Can I inspect the rewritten version?

Yes. Dias' output is standard Python code, and so, for example, you do not need to know anything about Dias to know why you got a speedup. Similarly, you can just copy Dias' output and use it as any other Python code.

To inspect the rewritten version, add the comment # DIAS_VERBOSE at the beginning of your cell (right after any magic functions). See this example.

Is Dias a replacement for pandas?

No (which inherently means Dias does not suffer from lack of API support). Dias is a rewriter, which inspects and possibly rewrites pandas code.

Does Dias work with a standard Python interpreter?

No. Dias currently uses IPython features.

When does Dias rewrite code?

Dias looks for certain patterns, and upon recognizing one, it rewrites the code to a faster version. Thus, Dias will rewrite the code if it contains one of the patterns it is programmed to look for. Consider this example. One pattern Dias looks for is any expression followed by sort_values(), followed by head(). Upon recognizing this pattern, it rewrites the code to use nsmallest(). You can take a look at the paper for more information.

Dias is still under early but active development, so expect more patterns to be added soon!

Is Dias probabilistic? Is Dias an assistant?

No and no. Dias is not probabilistic; if it rewrites code, it is always correct (barring implementation bugs). Dias is also not intended to be an assistant. First, it's intended to be more quiet than an assistant. If Dias does its job correctly, then you should never have to think of it. Second, while you can inspect the rewritten code, Dias does not offer any explanations of why the rewritten version is faster.

How to contribute

Dias is an ongoing research project by the ADAPT group @ UIUC. You can help us by sending us notebooks that you want to speed up and we will our best to make Dias do it automatically (send us an email with either the notebook or Colab link)! Moreover, if you are aware of a pattern that can be rewritten to a faster version, please consider submitting an issue. You can use our template.

We also welcome feedback from all backgrounds, including industry specialists, data analysts and academics. Please reach out to sb54@illinois.edu to share your opinion!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dias-0.1.2.tar.gz (45.0 kB view details)

Uploaded Source

Built Distribution

dias-0.1.2-py3-none-any.whl (43.3 kB view details)

Uploaded Python 3

File details

Details for the file dias-0.1.2.tar.gz.

File metadata

  • Download URL: dias-0.1.2.tar.gz
  • Upload date:
  • Size: 45.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for dias-0.1.2.tar.gz
Algorithm Hash digest
SHA256 65143602c901d23e322681b44100c2fc3494f8e418c0cd685468f576f1fe4137
MD5 677488dd5cd28fe3f7d1a5c2c4f89121
BLAKE2b-256 adea7eab0958da30793cf371e986012d598597928365c0a958a58c3ca9b02207

See more details on using hashes here.

Provenance

File details

Details for the file dias-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: dias-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 43.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for dias-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2d0701c862b4e4d29487fb169c79c2f0781993c90143a2c42cf2a10dcaaf7354
MD5 060f479ae14e5ce6d3f25e093f7b9bf6
BLAKE2b-256 2f9506594e26d1ca099a757bbfc5687ec2041138c2c4cce19d4d3d10b9febd90

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page