A Rust-powered, tidyverse-inspired DataFrame manipulation library for Python.

These details have not been verified by PyPI

Project links

Project description

Here’s an updated README.md you can drop in as-is, with:

A “What’s new in v0.2” section
All verbs from v0.1 + v0.2 described with syntax, behavior, and examples
Examples aligned with your current tests

📦 crowley-frame

A Rust-powered, tidyverse-inspired DataFrame manipulation library for Python

crowley-frame brings the ergonomics of dplyr/tidyr to Python—backed by Rust for safety, speed, and expressive semantics.

If you know R’s tidyverse, this feels natural. If you know pandas, this gives you a more composable, readable syntax with a proper grammar of data manipulation.

📌 Status

Version: 0.2.x (early but already useful)
Backend: Rust + Polars core (exposed via a Python façade)
Tests: ✅ 22 tests passing across:
- column selection & selectors
- mutate + lag/lead + rolling
- count
- slice_* verbs
- pivot_longer / pivot_wider
- separate / unite
- pipes + group_by / summarise
- joins (left_join, inner_join)

🆕 What’s new in v0.2

Building on the v0.1 core (select, mutate, filter, basic pivoting), v0.2 adds a lot of real tidyverse ergonomics:

New verbs / capabilities:

rename() – rename columns by mapping new_name="old_name".
relocate() – move columns before/after others or to the front.
distinct() – row de-duplication by columns (with keep= semantics).
count() – frequency tables + prop= (proportions) + sort=.
Slice family: slice_head(), slice_tail(), slice_sample(), slice_max(), slice_min().
Tidyr-style:
- separate() – split one column into several.
- unite() – combine several columns into one, with NA semantics.
Reshaping:
- pivot_longer() – long format via tidy-style selectors.
- pivot_wider() – wide format, round-tripping from pivot_longer.
Joins:
- left_join() and inner_join() matching pandas’ behavior, including suffixes and NaN key quirks, locked down via tests.
Selectors expanded:
- col.where_numeric(), col.where_string(), col.everything() in addition to starts_with, ends_with, contains, matches.
Pipe ergonomics:
- Frame.__rshift__ + pipe.* namespace lets you write actual tidyverse-style pipelines in Python, including group_by + summarise.

✅ Features Proven by the Test Suite (v0.2, 22 tests)

The following features are not theoretical — they are fully implemented and validated by your test suite.

🔍 Column Selection + Tidy Selectors

Supported selectors (via crowley_frame.col):

col("name") / plain strings
col.starts_with("prefix")
col.ends_with("suffix")
col.contains("substring")
col.matches(r"regex")
col.where_numeric() – all numeric columns
col.where_string() – object/string columns
col.everything() – all columns

Syntax

from crowley_frame import df, col

cf.select(
    "user_id",
    col.starts_with("score"),
    col.where_numeric(),
)

Example

cf = df({"user_id": [1, 2, 3],
         "score_a": [10, 20, 30],
         "score_b": [5, 7, 9],
         "group": ["a", "b", "a"]})

num_out = cf.select(col.where_numeric()).to_pandas()

Possible output

   user_id  score_a  score_b
0        1       10        5
1        2       20        7
2        3       30        9
``` :contentReference[oaicite:11]{index=11}  

---

## ✨ `mutate()`, `lag()`, `lead()`, `rolling_mean()`

### `mutate(**new_columns)`

- **Strings** are evaluated as pandas expressions in DataFrame context.
- Non-strings are treated as literal sequences/scalars.

```python
cf = df({"x": [1, 2, 3, 4, 5]})

out = cf.mutate(
    double="x * 2",
    z="x ** 2 + 1",
).to_pandas()

Example output

   x  double   z
0  1       2   2
1  2       4   5
2  3       6  10
3  4       8  17
4  5      10  26
``` :contentReference[oaicite:12]{index=12}  

### `lag(col, n=1, default=None)` / `lead(col, n=1, default=None)`

Helpers that return `pd.Series` to be plugged into `mutate` or used directly.

```python
cf = df({"val": [10, 20, 30]})

cf.mutate(
    lag_val=cf.lag("val", 1),
    lead_val=cf.lead("val", 1),
).to_pandas()

Output

   val  lag_val  lead_val
0   10      NaN      20.0
1   20     10.0      30.0
2   30     20.0       NaN
``` :contentReference[oaicite:13]{index=13}  

### `rolling_mean(col, window, min_periods=None)`

```python
cf = df({"val": [1.0, 2.0, 3.0, 4.0]})

cf.mutate(
    roll3=cf.rolling_mean("val", window=3, min_periods=2),
).to_pandas()

Output

   val  roll3
0  1.0    NaN
1  2.0    1.5
2  3.0    2.0
3  4.0    3.0
``` :contentReference[oaicite:14]{index=14}  

---

## 🔗 Pipe Syntax (`>>`) + `group_by()` → `summarise()`

You get **real tidyverse pipes** in Python via `pipe.*` and `Frame.__rshift__`. :contentReference[oaicite:15]{index=15}  

**Syntax**

```python
from crowley_frame import df, pipe

(
    cf
    >> pipe.group_by("user_id")
    >> pipe.summarise(
        mean_score=("score", "mean"),
        n=("score", "count"),
    )
).to_pandas()

Example

cf = df({"user_id": [1, 2, 1],
         "score":   [5, 7, 9]})

out = (
    cf
    >> pipe.group_by("user_id")
    >> pipe.summarise(
        mean_score=("score", "mean"),
        n=("score", "count"),
    )
).to_pandas()

Output

   user_id  mean_score  n
0        1         7.0  2
1        2         7.0  1
``` :contentReference[oaicite:16]{index=16}  

> **Note:** Use parentheses around the whole pipe chain before calling `.to_pandas()`, so Python’s precedence doesn’t bind `.to_pandas()` to the last pipe function.

---

## 🔢 `count()` – frequencies and proportions

**Syntax**

```python
Frame.count(
    *cols: str,
    sort: bool = False,
    prop: bool = False,
    name: str = "n",
)

No cols → total row count.
With cols → grouped counts.
prop=True → add prop column with relative frequencies.
sort=True → sort by n descending.

Example

cf = df({"grp": ["a", "a", "b", "b", "b"]})

cf.count("grp", prop=True, sort=True).to_pandas()

Output

  grp  n  prop
0   b  3  0.60
1   a  2  0.40
``` :contentReference[oaicite:18]{index=18}  

---

## ✂️ Slice verbs: `slice_head`, `slice_tail`, `slice_sample`, `slice_max`, `slice_min`

These mirror dplyr / tidyr slice semantics. :contentReference[oaicite:19]{index=19}  

### `slice_head(n=5)` / `slice_tail(n=5)`

```python
cf = df({"x": [1, 2, 3, 4, 5]})

head3 = cf.slice_head(3).to_pandas()
tail2 = cf.slice_tail(2).to_pandas()

head3

tail2

   x
0  4
1  5
``` :contentReference[oaicite:20]{index=20}  

### `slice_sample(n=None, prop=None, replace=False, random_state=None)`

- Provide **exactly one** of `n` or `prop`.

```python
cf = df({"x": list(range(10))})

s1 = cf.slice_sample(n=3, random_state=42).to_pandas()
s2 = cf.slice_sample(prop=0.3, random_state=42).to_pandas()

`slice_max(order_by, n=5)` / `slice_min(order_by, n=5)`

Uses nlargest / nsmallest to get top/bottom rows by a column.

cf = df({"x": [1, 5, 3, 9, 2],
         "y": ["a", "b", "c", "d", "e"]})

max2 = cf.slice_max("x", n=2).to_pandas()
min2 = cf.slice_min("x", n=2).to_pandas()

max2

   x  y
0  9  d
1  5  b

min2

   x  y
0  1  a
1  2  e
``` :contentReference[oaicite:22]{index=22}  

---

## 🔄 `pivot_longer()` and `pivot_wider()`

Round-trippable reshaping modeled after tidyr. :contentReference[oaicite:23]{index=23}  

### `pivot_longer`

**Syntax**

```python
Frame.pivot_longer(
    *cols,                      # tidy selectors
    cols: Sequence[Any] | None = None,
    names_to: str = "name",
    values_to: str = "value",
    names_prefix: str | None = None,
)

You can pass selectors positionally:

cf = df({
    "id": [1, 2],
    "year_2023": [10, 30],
    "year_2024": [11, 31],
})

long = cf.pivot_longer(
    col.matches(r"^year_"),
    names_to="year",
    values_to="value",
).to_pandas()

Output

   id       year  value
0   1  year_2023     10
1   2  year_2023     30
2   1  year_2024     11
3   2  year_2024     31
``` :contentReference[oaicite:24]{index=24}  

### `pivot_wider`

**Syntax**

```python
Frame.pivot_wider(
    names_from: str,
    values_from: str,
    values_fill: Any = None,
    sep: str = "_",
)

wide = (
    long
    .pivot_wider(
        names_from="year",
        values_from="value",
        values_fill=None,
    )
    .to_pandas()
    .sort_values("id")
    .reset_index(drop=True)
)

Output

   id  year_2023  year_2024
0   1         10         11
1   2         30         31
``` :contentReference[oaicite:25]{index=25}  

---

## 🔬 `separate()` & `unite()` with proper NA semantics

Modeled on tidyr’s `separate` and `unite`, including behavior around missing values. :contentReference[oaicite:26]{index=26}  

### `separate(col, into, sep=r"\s+", remove=True, convert=False)`

```python
cf = df({
    "id": [1, 2, 3],
    "coords": ["1,2", "10,20", "5,7"],
})

out = (
    cf
    .separate("coords", into=["x", "y"], sep=",")
    .to_pandas()
    .sort_values("id")
    .reset_index(drop=True)
)

Output

   id   x   y
0   1   1   2
1   2  10  20
2   3   5   7
``` :contentReference[oaicite:27]{index=27}  

### `unite(col, cols, sep="_", remove=True, na_rm=False)`

```python
cf = df({
    "first": ["Ada", None, "Charlie"],
    "last":  ["Lovelace", "Smith", None],
})

out_default = cf.unite("full_name", ["first", "last"], sep=" ").to_pandas()

Output (default na_rm=False)

      full_name
0  Ada Lovelace
1          <NA>
2          <NA>
``` :contentReference[oaicite:28]{index=28}  

With `na_rm=True`, NAs are treated as empty strings and trimmed from ends.

---

## 🧱 `rename()`, `relocate()`, and `distinct()`

### `rename(**mapping)`

```python
cf = df({"user_id": [1, 2, 3], "user_score": [5, 7, 9]})

cf.rename(user="user_id", score="user_score").to_pandas()

Output

   user  score
0     1      5
1     2      7
2     3      9
``` :contentReference[oaicite:29]{index=29}  

### `relocate(*cols, before=None, after=None)`

Moves one or more columns relative to others.

```python
cf = df({
    "user_id": [1, 2, 3],
    "user_score": [5, 7, 9],
    "other": [0, 0, 1],
})

cf.relocate("user_score", before="user_id").to_pandas()

Output

   user_score  user_id  other
0           5        1      0
1           7        2      0
2           9        3      1
``` :contentReference[oaicite:30]{index=30}  

If neither `before` nor `after` is given, selected columns are moved to the **front**.

### `distinct(*cols, keep="first" | "last" | False)`

```python
cf = df({
    "user": [1, 1, 2, 3],
    "score": [5, 5, 7, 7],
    "other": [0, 1, 0, 1],
})

# distinct users, keep first occurrence
cf.distinct("user").to_pandas()

Output

   user  score  other
0     1      5      0
1     2      7      0
2     3      7      1
``` :contentReference[oaicite:31]{index=31}  

---

## 🤝 Joins: `left_join()` and `inner_join()`

Joins are defined in `Frame` and exercised heavily in `test_joins.py`. :contentReference[oaicite:32]{index=32}  

**Syntax**

```python
Frame.left_join(
    other: Frame,
    on: str | Sequence[str] | None = None,
    left_on: str | Sequence[str] | None = None,
    right_on: str | Sequence[str] | None = None,
    suffixes: tuple[str, str] = ("_x", "_y"),
    validate: str | None = None,
)

Frame.inner_join( ...same signature... )

Basic left join

from crowley_frame import df

left = df({"id": [1, 2, 3], "x": [10, 20, 30]})
right = df({"id": [2, 3, 4], "y": [200, 300, 400]})

out = left.left_join(right, on="id").to_pandas()

Output

   id   x      y
0   1  10    NaN
1   2  20  200.0
2   3  30  300.0
``` :contentReference[oaicite:33]{index=33}  

### Basic inner join

```python
inner = left.inner_join(right, on="id").to_pandas()

Output

   id   x    y
0   2  20  200
1   3  30  300
``` :contentReference[oaicite:34]{index=34}  

### Overlapping column names + suffixes

```python
left = df({"id": [1, 2], "val": [10, 20]})
right = df({"id": [2, 3], "val": [200, 300]})

out = left.left_join(right, on="id", suffixes=("_left", "_right")).to_pandas()

Gives val_left / val_right columns, matching pandas.

NaN key behavior

Joins with NaN keys are explicitly locked to whatever pandas does on the version you’re running:

left_pdf = pd.DataFrame({"id": [1.0, float("nan"), 3.0], "x": [10, 20, 30]})
right_pdf = pd.DataFrame({"id": [float("nan"), 3.0], "y": [200, 300]})

left = df(left_pdf)
right = df(right_pdf)

out_left = left.left_join(right, on="id").to_pandas()
out_inner = left.inner_join(right, on="id").to_pandas()

Both are tested against pd.merge with the same options.

📥 Installation

For contributors (local dev)

maturin develop --release

(Future) PyPI install

pip install crowley-frame

🚀 Usage Overview (Quick Tour)

from crowley_frame import df, col, pipe
import pandas as pd

pdf = pd.DataFrame({"user_id":[1,2,1,3],
                    "user_score":[5,7,9,7],
                    "other":[0,0,1,1]})

cf = df(pdf)

1. Select

cf.select("user_id", col.starts_with("user_")).to_pandas()

2. Mutate

cf.mutate(
    z="(user_score - user_score.mean()) / user_score.std()"
).to_pandas()

3. Group + summarise with pipes

(
    cf
    >> pipe.group_by("user_id")
    >> pipe.summarise(
        mean_score=("user_score", "mean"),
        n=("user_score", "count"),
    )
).to_pandas()

4. Reshape

wide = df({
    "id":[1,2],
    "year_2023":[10,30],
    "year_2024":[11,31],
})

long = wide.pivot_longer(
    col.matches(r"^year_"),
    names_to="year",
    values_to="value",
)

roundtrip = long.pivot_wider(
    names_from="year",
    values_from="value",
)

5. Join

left  = df({"id":[1,2,3], "x":[10,20,30]})
right = df({"id":[2,3,4], "y":[200,300,400]})

left.left_join(right, on="id").to_pandas()

🧭 Roadmap

Planned next steps (v0.2.x → v0.3):

Complete join family: right_join, full_join, semi_join, anti_join.
More tidyverse verbs:
- across(), case_when(), if_else()
- drop_na, fill, complete, expand, nest / unnest
Crowley-specific features:
- Stronger expression engine for mutate / filter.
- Optional lazy mode over Polars / Arrow.
Rust side:
- More kernels pushed down into _crowley.Frame for performance.
- SIMD and parallelization for heavy verbs.

📄 License

MIT License — free to use, modify, and distribute.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Dec 13, 2025

0.1.0

Dec 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crowleyframe-0.2.0.tar.gz (37.7 kB view details)

Uploaded Dec 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crowleyframe-0.2.0-cp310-cp310-win_amd64.whl (4.8 MB view details)

Uploaded Dec 13, 2025 CPython 3.10Windows x86-64

File details

Details for the file crowleyframe-0.2.0.tar.gz.

File metadata

Download URL: crowleyframe-0.2.0.tar.gz
Upload date: Dec 13, 2025
Size: 37.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.10.2

File hashes

Hashes for crowleyframe-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`a1b104ceeae32af9d68ab3886eb8bd4c5aa061187e3687b0ffba5d6a0a9c3779`
MD5	`74a9af606607097573680f3765e80a4a`
BLAKE2b-256	`6e67b22953432e13c0908b1f8d2f99213cb94a1be5c23d841be5ddbd47ca9d1d`

See more details on using hashes here.

File details

Details for the file crowleyframe-0.2.0-cp310-cp310-win_amd64.whl.

File metadata

Download URL: crowleyframe-0.2.0-cp310-cp310-win_amd64.whl
Upload date: Dec 13, 2025
Size: 4.8 MB
Tags: CPython 3.10, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.10.2

File hashes

Hashes for crowleyframe-0.2.0-cp310-cp310-win_amd64.whl
Algorithm	Hash digest
SHA256	`28930e7bc2f2e54fa1d1ae8770a699bc73612bb3a71f7fc295228d8f5dc48bf5`
MD5	`aa2d81984dd0495786c60a7915fd2410`
BLAKE2b-256	`541990019edcc45c19af62701de7d4bccf55295e7b57fa87ed13c2c144afbeeb`

See more details on using hashes here.

crowleyframe 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

📦 crowley-frame

A Rust-powered, tidyverse-inspired DataFrame manipulation library for Python

📌 Status

🆕 What’s new in v0.2

✅ Features Proven by the Test Suite (v0.2, 22 tests)

🔍 Column Selection + Tidy Selectors

slice_max(order_by, n=5) / slice_min(order_by, n=5)

Basic left join

NaN key behavior

📥 Installation

For contributors (local dev)

(Future) PyPI install

🚀 Usage Overview (Quick Tour)

1. Select

2. Mutate

3. Group + summarise with pipes

4. Reshape

5. Join

🧭 Roadmap

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`slice_max(order_by, n=5)` / `slice_min(order_by, n=5)`