IPython wrapper to more easily manipulate Pandas dataframes.
Project description
pandacell
Author: Eirik B. Stavestrand
Introduces a %df
(or %%df
) magic which can be used in Jupyter notebooks and the IPython console.
The magic executes the contents of a cell on a Pandas DataFrame.
Description
Pandas is great and all, but writing Pandas code can be tedious. For example when simply making summing two columns:
In [1]: df["a"] + df["b"]
It might not look like such a big deal, but all those brackets and quotation marks add up. Using pandacell, the above syntax can be written as:
In [2]: %df a + b
Under the hoods, this is accomplished simply by passing the cell contents as a string to Pandas' df.eval
function.
This isn't very complex, but it does provide a fair deal of functionality and adds a whole lot of readability.
If you wish to store the results to a new column, use regular assignment along with the -i
(or --inplace
) flag:
In [3]: %df -i c = a + b
It also works with multiple assignments:
In [4]: %%df -i
...: c = a + b
...: f = c - a
You can use Pandas' various accessors and series method calls:
In [5]: %%df -i
...: name_upper = name.str.upper()
...: yr = timestamp.dt.year
...: lower_cased = species.where(cond=species.str[0].str.islower(), other=None)
Since variable names are assumed to be columns in the dataframe, regular variables in the local/global namespace can be accessed by prefixing with @
In [6]: a = 1
...: %df a = @a + 1
In [7]: def myfunc(row):
...: return row + 43
...: %df b = a.apply(@myfunc)
By default, pandacell operates on any dataframe named df
. This can be overridden with the -n
(or --name
) flag:
In [8]: %df -n=df_in c = a + b
You can also print subset a dataframe with the -q
(or --query
) flag:
In [9]: %df -q species == "setosa"
Out[9]:
sepal_length sepal_width petal_length petal_width species a
0 5.1 3.5 1.4 0.2 setosa 0
1 4.9 3.0 1.4 0.2 setosa 0
In [10]: %df -q species.isna() #check for missing values
Out[10]:
Empty DataFrame
Columns: [sepal_length, sepal_width, petal_length, petal_width, species]
Index: []
This can be combined with the -i
flag to subset the dataframe in-place:
In [10]: %df -q -i species == "setosa"
Pandacell even supports comments
In [11]: %%df -i
...: # Line comment
...: c = a + b # Comment at end of line
Inspired by
https://github.com/catherinedevlin/ipython-sql
Development
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pandacell-2020.9.19-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffe0a9beeb42d7c741125300b69ffe4c1d1bac998914d60d279ec3c51e52b7cf |
|
MD5 | f7f3f0564fb67f1bfa9057842e8dacab |
|
BLAKE2b-256 | 7f7cccdaa697eb16ce149e4d8d46c04df9e7d0b8591fa246d2a6ef60c50285fe |