Port of dplyr and other related R packages in python, using pipda.
Project description
datar
A Grammar of Data Manipulation in python
Documentation | Reference Maps | Notebook Examples | API | Blog
datar
is a re-imagining of APIs of data manipulation libraries in python (currently only pandas
supported) so that you can manipulate your data with it like with dplyr
in R
.
datar
is an in-depth port of tidyverse
packages, such as dplyr
, tidyr
, forcats
and tibble
, as well as some functions from base R.
Installation
pip install -U datar
# install pdtypes support
pip install -U datar[pdtypes]
# install dependencies for modin as backend
pip install -U datar[modin]
# you may also need to install dependencies for modin engines
# pip install -U modin[ray]
Example usage
from datar import f
from datar.dplyr import mutate, filter, if_else
from datar.tibble import tibble
# or
# from datar.all import f, mutate, filter, if_else, tibble
df = tibble(
x=range(4), # or f[:4]
y=['zero', 'one', 'two', 'three']
)
df >> mutate(z=f.x)
"""# output
x y z
<int64> <object> <int64>
0 0 zero 0
1 1 one 1
2 2 two 2
3 3 three 3
"""
df >> mutate(z=if_else(f.x>1, 1, 0))
"""# output:
x y z
<int64> <object> <int64>
0 0 zero 0
1 1 one 0
2 2 two 1
3 3 three 1
"""
df >> filter(f.x>1)
"""# output:
x y
<int64> <object>
0 2 two
1 3 three
"""
df >> mutate(z=if_else(f.x>1, 1, 0)) >> filter(f.z==1)
"""# output:
x y z
<int64> <object> <int64>
0 2 two 1
1 3 three 1
"""
# works with plotnine
# example grabbed from https://github.com/has2k1/plydata
import numpy
from datar.base import sin, pi
from plotnine import ggplot, aes, geom_line, theme_classic
df = tibble(x=numpy.linspace(0, 2*pi, 500))
(df >>
mutate(y=sin(f.x), sign=if_else(f.y>=0, "positive", "negative")) >>
ggplot(aes(x='x', y='y')) +
theme_classic() +
geom_line(aes(color='sign'), size=1.2))
# easy to integrate with other libraries
# for example: klib
import klib
from datar.core.factory import verb_factory
from datar.datasets import iris
from datar.dplyr import pull
dist_plot = verb_factory(func=klib.dist_plot)
iris >> pull(f.Sepal_Length) >> dist_plot()
See also some advanced examples from my answers on StackOverflow:
- Compare 2 DataFrames and drop rows that do not contain corresponding ID variables
- count by id with dynamic criteria
- counting the frequency in python size vs count
- Pandas equivalent of R/dplyr group_by summarise concatenation
- ntiles over columns in python using R's "mutate(across(cols = ..."
- Replicate R Solution in Python for Calculating Monthly CRR
- Best/Concise Way to Conditionally Concat two Columns in Pandas DataFrame
- how to transform R dataframe to rows of indicator values
- Left join on multiple columns
- Python: change column of strings with None to 0/1
- Comparing 2 data frames and finding values are not in 2nd data frame
- How to compare two Pandas DataFrames based on specific columns in Python?
- expand.grid equivalent to get pandas data frame for prediction in Python
- Python pandas equivalent to R's group_by, mutate, and ifelse
- How to convert a list of dictionaries to a Pandas Dataframe with one of the values as column name?
- Moving window on a Standard Deviation & Mean calculation
- Python: creating new "interpolated" rows based on a specific field in Pandas
- How would I extend a Pandas DataFrame such as this?
- How to define new variable based on multiple conditions in Pandas - dplyr case_when equivalent
- What is the Pandas equivalent of top_n() in dplyr?
- Equivalent of fct_lump in pandas
- pandas equivalent of fct_reorder
- Is there a way to find out the 2 X 2 contingency table consisting of the count of values by applying a condition from two dataframe
- Count if array in pandas
- How to create a new column for transposed data
- How to create new DataFrame based on conditions from another DataFrame
- Refer to column of a data frame that is being defined
- How to use regex in mutate dplython to add new column
- Multiplying a row by the previous row (with a certain name) in Pandas
- Create dataframe from rows under a row with a certain condition
- pandas data frame, group by multiple cols and put other columns' contents in one
- Pandas custom aggregate function with condition on group, is it possible?
- multiply different values to pandas column with combination of other columns
- Vectorized column-wise regex matching in pandas
- Iterate through and conditionally append string values in a Pandas dataframe
- Groupby mutate equivalent in pandas/python using tidydata principles
- More ...
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
datar-0.8.4.tar.gz
(10.2 MB
view details)
Built Distribution
datar-0.8.4-py3-none-any.whl
(10.3 MB
view details)
File details
Details for the file datar-0.8.4.tar.gz
.
File metadata
- Download URL: datar-0.8.4.tar.gz
- Upload date:
- Size: 10.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.10.4 Linux/5.13.0-1022-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f2365efdff924ecd1052351a5865cbca12a61374e973a2e2aeec71f03df8451 |
|
MD5 | 52341d4a7263824e959abbbc2b708560 |
|
BLAKE2b-256 | 73693fb052439c8ac64b2cb8fb0d18acaad852f218df0efc3c5152a2e3cb589a |
Provenance
File details
Details for the file datar-0.8.4-py3-none-any.whl
.
File metadata
- Download URL: datar-0.8.4-py3-none-any.whl
- Upload date:
- Size: 10.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.10.4 Linux/5.13.0-1022-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a9fdc42930ac4904e8572dd2ef155884b31b4c24faf4e0b642c846db75b2895 |
|
MD5 | 9a7ad47e4676cf0d6bbec93a12347c89 |
|
BLAKE2b-256 | 078b405a72044ada104ae37fa9e0cb592ae45208b3c8112a1eb6e323e4fc6fba |