A Rust-powered, tidyverse-inspired DataFrame manipulation library for Python.
Project description
Here is your README cleanly converted into proper Markdown, without the surrounding code fences, ready to paste directly into README.md on GitHub or PyPI.
📦 crowley-frame
A Rust-powered, tidyverse-inspired DataFrame manipulation library for Python
crowley-frame brings the ergonomics of dplyr/tidyr to Python—backed by Rust for safety, speed, and expressive semantics.
If you know R’s tidyverse, this feels natural. If you know pandas, this gives you a more composable, readable syntax with a proper grammar of data manipulation.
✅ Features Proven by the Test Suite (18 Tests Passed)
The following features are not theoretical — they are fully implemented and validated through the test suite.
🔍 Column Selection + Tidy Selectors
(From test_select_and_col.py)
Supports:
- selecting by name
col.starts_with()col.ends_with()col.contains()col.matches(regex)- mixing names + selectors
Example
cf = df({"user_id": [1,2], "score_a": [10,20], "score_b": [5,7]})
cf.select(col("user_id"), col.starts_with("score")).to_pandas()
Output
user_id score_a score_b
0 1 10 5
1 2 20 7
✨ mutate(), lag(), lead(), rolling_mean()
(From test_mutate_lag_lead_rolling.py)
You can:
- create new columns with expressions
- compute window offsets (
lag,lead) - compute rolling window statistics (e.g., rolling mean)
Example
cf = df({"x": [1,2,3,4,5]})
cf.mutate(
double="x * 2",
lag_x=lag("x", 1),
roll3=rolling_mean("x", 3),
).to_pandas()
Output
x double lag_x roll3
0 1 2 NaN NaN
1 2 4 1.0 NaN
2 3 6 2.0 2.0
3 4 8 3.0 3.0
4 5 10 4.0 4.0
🔗 Pipe Syntax (>>) + group_by() → summarise()
(From test_groupby_summarise_pipe.py)
Yes — you can actually do tidyverse pipes in Python.
Example
cf = df({"user_id": [1,2,1], "score":[5,7,9]})
result = (
cf
>> pipe.group_by("user_id")
>> pipe.summarise(
mean_score=("score", "mean"),
n=("score", "count"),
)
).to_pandas()
result
Output
user_id mean_score n
0 1 7.0 2
1 2 7.0 1
🔢 count(), Proportions, Row Counting
(From test_count_prop.py)
count():
- with no arguments → counts rows
- with columns → frequency tables
- add
prop=Truefor proportions
Example
cf = df({"grp":[1,1,2,2,2]})
cf.count("grp", prop=True, sort=True).to_pandas()
Output
grp n prop
0 2 3 0.60
1 1 2 0.40
✂️ slice(), head(), tail()
(From test_slice.py)
Example
cf = df({"x":[10,20,30,40]})
cf.slice(1,3).to_pandas()
Output
x
1 20
2 30
🔄 pivot_longer() and pivot_wider()
(From test_pivot_longer_wider_basic.py, test_tidyr.py)
pivot_longer
cf = df({
"id":[1,2],
"year_2023":[10,30],
"year_2024":[11,31],
})
cf.pivot_longer(
col.matches("^year_"),
names_to="year",
values_to="value",
).to_pandas()
Output
id year value
0 1 year_2023 10
1 2 year_2023 30
2 1 year_2024 11
3 2 year_2024 31
pivot_wider
long = cf.pivot_longer(...)
long.pivot_wider(names_from="year", values_from="value").to_pandas()
Output
id year_2023 year_2024
0 1 10 11
1 2 30 31
🔬 separate() & unite() with Proper NA Semantics
(From test_separate_unite.py)
unite()
cf = df({
"first":["Ada", None, "Charlie"],
"last":["Lovelace", "Smith", None],
})
cf.unite("full", ["first","last"], sep=" ").to_pandas()
Output
full
0 Ada Lovelace
1 <NA>
2 <NA>
separate()
cf = df({"full":["Ada Lovelace", "John Smith"]})
cf.separate("full", into=["first","last"], sep=" ").to_pandas()
Output
first last
0 Ada Lovelace
1 John Smith
📥 Installation
For contributors (local dev)
maturin develop --release
Future PyPI install
pip install crowley-frame
🚀 Usage Overview
Create a DataFrame
from crowley_frame import df, col, pipe
cf = df({"x":[1,2,3], "y":[10,20,30]})
Select columns
cf.select(col.starts_with("y")).to_pandas()
Output:
y
0 10
1 20
2 30
Mutate
cf.mutate(z="x + y").to_pandas()
Output:
x y z
0 1 10 11
1 2 20 22
2 3 30 33
Group + summarise with pipes
cf >> pipe.group_by("x") >> pipe.summarise(sum_y=("y","sum"))
Output:
x sum_y
0 1 10
1 2 20
2 3 30
Reshape: pivot_longer
cf.pivot_longer(col.starts_with("y"), names_to="year", values_to="value")
Output:
x year value
0 1 y1 10
1 1 y2 20
🧭 Roadmap (Next Milestones)
- More window functions (rolling_sum, rolling_sd, rolling_min/max)
- Lazy backend (like dplyr/dbplyr or polars-lazy)
- More expressive mutate expression engine
- Arrow-native memory and zero-copy interfaces
- SIMD and GPU-accelerated Rust kernels
- Better type inference + schema evolution
📄 License
MIT License — free to use, modify, and distribute.
Tidyverse-style data manipulation for Python, powered by Rust and Polars.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crowleyframe-0.1.0.tar.gz.
File metadata
- Download URL: crowleyframe-0.1.0.tar.gz
- Upload date:
- Size: 31.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7648c1f82308d1c7c69a9e4d3b0eaca601cb4c1b9c52f028b7e1bafe0859eab2
|
|
| MD5 |
0f34e2e367245464a4372aaea8d0571c
|
|
| BLAKE2b-256 |
6b87783c9efad8f39fa28d891ac32638b852f1f59ee4195ec48df51a64ba5468
|
File details
Details for the file crowleyframe-0.1.0-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: crowleyframe-0.1.0-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 4.8 MB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15b0a647ac2b47b97fef39b4d109b4a1656ee15ecacc9bbef63823d929bfb424
|
|
| MD5 |
4b7824b15c4fad01405a6fa15ab4a106
|
|
| BLAKE2b-256 |
70b314c3383bf36a57d066b3c742e04ed52e95e8afbece6c7d637354467e3e0a
|