Skip to main content

IPython wrapper to more easily manipulate Pandas dataframes.

Project description

pandacell

Author: Eirik B. Stavestrand

Introduces a %df (or %%df) magic which can be used in Jupyter notebooks and the IPython console.

The magic executes the contents of a cell on a Pandas DataFrame.

Description

Pandas is great and all, but writing Pandas code can be tedious. For example when simply making summing two columns:

    In [1]: df["a"] + df["b"]

It might not look like such a big deal, but all those brackets and quotation marks add up. Using pandacell, the above syntax can be written as:

    In [2]: %df a + b

Under the hoods, this is accomplished simply by passing the cell contents as a string to Pandas' df.eval function. This isn't very complex, but it does provide a fair deal of functionality and adds a whole lot of readability.

If you wish to store the results to a new column, use regular assignment along with the -i (or --inplace) flag:

    In [3]: %df -i c = a + b

It also works with multiple assignments:

    In [4]: %%df -i
       ...: c = a + b
       ...: f = c - a

You can use Pandas' various accessors and series method calls:

    In [5]: %%df -i
       ...: name_upper = name.str.upper()
       ...: yr = timestamp.dt.year
       ...: lower_cased = species.where(cond=species.str[0].str.islower(), other=None)

Since variable names are assumed to be columns in the dataframe, regular variables in the local/global namespace can be accessed by prefixing with @

    In [6]: a = 1
       ...: %df a = @a + 1

    In [7]: def myfunc(row):
        ...:     return row + 43
        ...: %df b = a.apply(@myfunc)

By default, pandacell operates on any dataframe named df. This can be overridden with the -n (or --name) flag:

    In [8]: %df -n=df_in c = a + b

You can also print subset a dataframe with the -q (or --query) flag:

    In [9]: %df -q species == "setosa"
    Out[9]:
        sepal_length  sepal_width  petal_length  petal_width species  a
    0            5.1          3.5           1.4          0.2  setosa  0
    1            4.9          3.0           1.4          0.2  setosa  0


    In [10]: %df -q species.isna() #check for missing values
    Out[10]:
    Empty DataFrame
    Columns: [sepal_length, sepal_width, petal_length, petal_width, species]
    Index: []

This can be combined with the -i flag to subset the dataframe in-place:

    In [10]: %df -q -i species == "setosa"

Pandacell even supports comments

    In [11]: %%df -i
       ...: # Line comment
       ...: c = a + b # Comment at end of line

Inspired by

https://github.com/catherinedevlin/ipython-sql

Development

https://github.com/eirki/pandacell

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandacell-2020.9.19.tar.gz (3.5 kB view hashes)

Uploaded Source

Built Distribution

pandacell-2020.9.19-py3-none-any.whl (3.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page