An extension to tidypolars4sci with additional functionalities for scientific data analysis.
Project description
tidypolars-extra
tidypolars-extra is an extension of tidypolars4sci, which provides Tidyverse-like functions for data manipulation and analysis in Python using Polars as the backend.
This project builds upon the original tidypolars4sci by adding extra functionalities and improvements while maintaining the same familiar API.
Features
- Tidyverse-style API for Polars DataFrames
- Scientific research utilities including LaTeX table generation
- Fast data manipulation powered by Polars
- Familiar R-like syntax for Python users
- Joins:
inner_join,left_join,full_join,semi_join,anti_join,cross_join - Data reshaping:
pivot_longer,pivot_wider,separate,unite,complete,nest,unnest - String manipulation (stringr-style):
str_detect,str_extract,str_replace,str_count,str_split,str_pad,str_squish,str_to_title, and more - Date/time utilities (lubridate-style):
year,month,floor_date,ceiling_date,difftime,today,now, duration constructors - Statistics:
mean,sd,cor,rank,scale,cumsum,ntile,weighted_mean,iqr,mad, and more - Factor manipulation (forcats-style):
fct_infreq,fct_lump,fct_recode,fct_collapse,fct_rev - Data quality:
describe,glimpse,get_dupes,assert_no_nulls,assert_unique,clean_names - Multi-format I/O: CSV, Excel, Stata, SPSS, RDS/RData, Parquet, JSON, Google Sheets
Installation
You can install tidypolars-extra with pip:
pip install tidypolars-extra
Basic usage
tidypolars-extra methods are designed to work like tidyverse functions:
import tidypolars_extra as tp
# create tibble data frame
df = tp.tibble(x = range(3),
y = range(3, 6),
z = ['a', 'a', 'b'])
(
df
.select('x', 'y', 'z')
.filter(tp.col('x') < 4, tp.col('y') > 1)
.arrange(tp.desc('z'), 'x')
.mutate(double_x = tp.col('x') * 2,
x_plus_y = tp.col('x') + tp.col('y')
)
)
┌─────┬─────┬─────┬──────────┬──────────┐
│ x ┆ y ┆ z ┆ double_x ┆ x_plus_y │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪══════════╪══════════╡
│ 2 ┆ 5 ┆ b ┆ 4 ┆ 7 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 0 ┆ 3 ┆ a ┆ 0 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ 4 ┆ a ┆ 2 ┆ 5 │
└─────┴─────┴─────┴──────────┴──────────┘
Converting to/from pandas data frames
If you need to use a package that requires pandas or polars data frames, you can convert from a tidypolars_extra tibble to either of those DataFrame formats.
# convert to pandas...
df = df.to_pandas()
# ... or convert to polars
df = df.to_polars()
To convert from a pandas or polars DataFrame to a tidypolars tibble:
# convert from pandas...
df = tp.from_pandas(df)
# or covert from polars
df = tp.from_polars(df)
Roadmap
The following features are planned for future releases:
Missing Tidyverse Functions
- dplyr:
slice_min/slice_max,rows_insert/rows_update/rows_upsert,consecutive_id,rename_with,expand/nesting,rowwiseoperations (c_across,row_sums,row_means) - tidyr:
expand,nesting - stringr:
word,str_to_sentence - forcats:
fct_reorder(reorder levels by summary statistic of another variable) - purrr-style:
map/map2/pmapequivalents for list columns
Statistical & Scientific Computing
ceiling(complement to existingfloor),exp,log2se(standard error of the mean)pmin/pmax(parallel min/max across columns)winsorize(cap extreme values at percentiles)
Data Quality & Exploration
skim()(richer type-aware summary inspired by R's skimr)assert_type/assert_range(additional data validation assertions)- Type hints across all public functions for IDE support and static analysis
Interoperability
to_arrow()(explicit PyArrow Table export)- DuckDB integration (
to_duckdb/from_duckdb) write_rdsinsave_data(complete R round-trip)
Code Quality
- Add comprehensive type hints throughout codebase
- Configure mypy/pyright for static type checking in CI
- Add test coverage measurement and thresholds
- Expand edge case testing (empty DataFrames, NaN, Inf)
Acknowledgments
This project is an extension of:
- tidypolars4sci by Diogo Ferrari
- tidypolars — the original starting point
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tidypolars_extra-0.2.0.tar.gz.
File metadata
- Download URL: tidypolars_extra-0.2.0.tar.gz
- Upload date:
- Size: 706.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4f302124b888173d3353e8270b334ba4ac2cbc7bf869f2ce777e648b28751b4
|
|
| MD5 |
c89d8fc6012224e36a8ba2bdb9b78913
|
|
| BLAKE2b-256 |
fa369e3a506ef3056af6ce3f72265f761ffcd1f88f569ac472590ef04fc5b422
|
Provenance
The following attestation bundles were made for tidypolars_extra-0.2.0.tar.gz:
Publisher:
publish.yml on mdmanurung/tidypolars-extra
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tidypolars_extra-0.2.0.tar.gz -
Subject digest:
c4f302124b888173d3353e8270b334ba4ac2cbc7bf869f2ce777e648b28751b4 - Sigstore transparency entry: 1252555511
- Sigstore integration time:
-
Permalink:
mdmanurung/tidypolars-extra@3f0976c9301dbdfcca4e767b07fe1e4dd0b579c4 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/mdmanurung
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3f0976c9301dbdfcca4e767b07fe1e4dd0b579c4 -
Trigger Event:
release
-
Statement type:
File details
Details for the file tidypolars_extra-0.2.0-py3-none-any.whl.
File metadata
- Download URL: tidypolars_extra-0.2.0-py3-none-any.whl
- Upload date:
- Size: 737.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2e52ad147598b5f7d3ea3e698774ecbf5dcdce05f236314b70ee9a65e94ef6b
|
|
| MD5 |
3d329c124acc1e0da90bc45949fe493c
|
|
| BLAKE2b-256 |
1eb020f0756cf7fd5c0ac7fd8a7a8460847c5195a305c9c71217c1b41724923a
|
Provenance
The following attestation bundles were made for tidypolars_extra-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on mdmanurung/tidypolars-extra
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tidypolars_extra-0.2.0-py3-none-any.whl -
Subject digest:
a2e52ad147598b5f7d3ea3e698774ecbf5dcdce05f236314b70ee9a65e94ef6b - Sigstore transparency entry: 1252555520
- Sigstore integration time:
-
Permalink:
mdmanurung/tidypolars-extra@3f0976c9301dbdfcca4e767b07fe1e4dd0b579c4 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/mdmanurung
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3f0976c9301dbdfcca4e767b07fe1e4dd0b579c4 -
Trigger Event:
release
-
Statement type: