Port of dplyr and other related R packages in python, using pipda.
Project description
datar
A Grammar of Data Manipulation in python
Documentation | Reference Maps | Notebook Examples | API | Blog
datar
is a re-imagining of APIs of data manipulation libraries in python (currently only pandas
supported) so that you can manipulate your data with it like with dplyr
in R
.
datar
is an in-depth port of tidyverse
packages, such as dplyr
, tidyr
, forcats
and tibble
, as well as some functions from R
itself.
Installtion
pip install -U datar
# install pdtypes support
pip install -U datar[pdtypes]
# install dependencies for modin as backend
pip install -U datar[modin]
# you may also need to install dependencies for modin engines
# pip install -U modin[ray]
Example usage
from datar import f
from datar.dplyr import mutate, filter, if_else
from datar.tibble import tibble
# or
# from datar.all import f, mutate, filter, if_else, tibble
df = tibble(
x=range(4), # or f[:4]
y=['zero', 'one', 'two', 'three']
)
df >> mutate(z=f.x)
"""# output
x y z
<int64> <object> <int64>
0 0 zero 0
1 1 one 1
2 2 two 2
3 3 three 3
"""
df >> mutate(z=if_else(f.x>1, 1, 0))
"""# output:
x y z
<int64> <object> <int64>
0 0 zero 0
1 1 one 0
2 2 two 1
3 3 three 1
"""
df >> filter(f.x>1)
"""# output:
x y
<int64> <object>
0 2 two
1 3 three
"""
df >> mutate(z=if_else(f.x>1, 1, 0)) >> filter(f.z==1)
"""# output:
x y z
<int64> <object> <int64>
0 2 two 1
1 3 three 1
"""
# works with plotnine
# example grabbed from https://github.com/has2k1/plydata
import numpy
from datar.base import sin, pi
from plotnine import ggplot, aes, geom_line, theme_classic
df = tibble(x=numpy.linspace(0, 2*pi, 500))
(df >>
mutate(y=sin(f.x), sign=if_else(f.y>=0, "positive", "negative")) >>
ggplot(aes(x='x', y='y')) +
theme_classic() +
geom_line(aes(color='sign'), size=1.2))
# easy to integrate with other libraries
# for example: klib
import klib
from datar.core.factory import verb_factory
from datar.datasets import iris
from datar.dplyr import pull
dist_plot = verb_factory(func=klib.dist_plot)
iris >> pull(f.Sepal_Length) >> dist_plot()
See also some advanced examples from my answers on StackOverflow:
- Compare 2 DataFrames and drop rows that do not contain corresponding ID variables
- count by id with dynamic criteria
- counting the frequency in python size vs count
- Pandas equivalent of R/dplyr group_by summarise concatenation
- ntiles over columns in python using R's "mutate(across(cols = ..."
- Replicate R Solution in Python for Calculating Monthly CRR
- Best/Concise Way to Conditionally Concat two Columns in Pandas DataFrame
- how to transform R dataframe to rows of indicator values
- Left join on multiple columns
- Python: change column of strings with None to 0/1
- Comparing 2 data frames and finding values are not in 2nd data frame
- How to compare two Pandas DataFrames based on specific columns in Python?
- expand.grid equivalent to get pandas data frame for prediction in Python
- Python pandas equivalent to R's group_by, mutate, and ifelse
- How to convert a list of dictionaries to a Pandas Dataframe with one of the values as column name?
- Moving window on a Standard Deviation & Mean calculation
- Python: creating new "interpolated" rows based on a specific field in Pandas
- How would I extend a Pandas DataFrame such as this?
- How to define new variable based on multiple conditions in Pandas - dplyr case_when equivalent
- What is the Pandas equivalent of top_n() in dplyr?
- Equivalent of fct_lump in pandas
- pandas equivalent of fct_reorder
- Is there a way to find out the 2 X 2 contingency table consisting of the count of values by applying a condition from two dataframe
- Count if array in pandas
- How to create a new column for transposed data
- How to create new DataFrame based on conditions from another DataFrame
- Refer to column of a data frame that is being defined
- How to use regex in mutate dplython to add new column
- Multiplying a row by the previous row (with a certain name) in Pandas
- Create dataframe from rows under a row with a certain condition
- pandas data frame, group by multiple cols and put other columns' contents in one
- Pandas custom aggregate function with condition on group, is it possible?
- multiply different values to pandas column with combination of other columns
- Vectorized column-wise regex matching in pandas
- Iterate through and conditionally append string values in a Pandas dataframe
- Groupby mutate equivalent in pandas/python using tidydata principles
- More ...
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datar-0.7.0.tar.gz
.
File metadata
- Download URL: datar-0.7.0.tar.gz
- Upload date:
- Size: 10.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.10.2 Linux/5.11.0-1028-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44ce2362a8b421022d41e300ae99ed3217a44a5cd920179e6e5316c0f3380801 |
|
MD5 | 4ba82949f9839d1f7bd816953cebdaa5 |
|
BLAKE2b-256 | 97f029be088bc579d528b250926274fa86f16282f8887100f8c467a83bf00a88 |
Provenance
File details
Details for the file datar-0.7.0-py3-none-any.whl
.
File metadata
- Download URL: datar-0.7.0-py3-none-any.whl
- Upload date:
- Size: 10.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.10.2 Linux/5.11.0-1028-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 941e4de2d31c76ff553f8cbd92a0316b7242f9fa801cc82bb836a04883591520 |
|
MD5 | 654a53564d5fcf396217e2e7e07fc503 |
|
BLAKE2b-256 | df40f31137399224ffb628a76cb111eaac2072c7371313d9c457e41be1c875a2 |