Skip to main content

A Rust-powered, tidyverse-inspired DataFrame manipulation library for Python.

Project description

Here is your README cleanly converted into proper Markdown, without the surrounding code fences, ready to paste directly into README.md on GitHub or PyPI.


📦 crowley-frame

A Rust-powered, tidyverse-inspired DataFrame manipulation library for Python

crowley-frame brings the ergonomics of dplyr/tidyr to Python—backed by Rust for safety, speed, and expressive semantics.

If you know R’s tidyverse, this feels natural. If you know pandas, this gives you a more composable, readable syntax with a proper grammar of data manipulation.


✅ Features Proven by the Test Suite (18 Tests Passed)

The following features are not theoretical — they are fully implemented and validated through the test suite.


🔍 Column Selection + Tidy Selectors

(From test_select_and_col.py)

Supports:

  • selecting by name
  • col.starts_with()
  • col.ends_with()
  • col.contains()
  • col.matches(regex)
  • mixing names + selectors

Example

cf = df({"user_id": [1,2], "score_a": [10,20], "score_b": [5,7]})
cf.select(col("user_id"), col.starts_with("score")).to_pandas()

Output

   user_id  score_a  score_b
0        1       10        5
1        2       20        7

✨ mutate(), lag(), lead(), rolling_mean()

(From test_mutate_lag_lead_rolling.py)

You can:

  • create new columns with expressions
  • compute window offsets (lag, lead)
  • compute rolling window statistics (e.g., rolling mean)

Example

cf = df({"x": [1,2,3,4,5]})
cf.mutate(
    double="x * 2",
    lag_x=lag("x", 1),
    roll3=rolling_mean("x", 3),
).to_pandas()

Output

   x  double  lag_x  roll3
0  1       2    NaN    NaN
1  2       4    1.0    NaN
2  3       6    2.0    2.0
3  4       8    3.0    3.0
4  5      10    4.0    4.0

🔗 Pipe Syntax (>>) + group_by() → summarise()

(From test_groupby_summarise_pipe.py)

Yes — you can actually do tidyverse pipes in Python.

Example

cf = df({"user_id": [1,2,1], "score":[5,7,9]})

result = (
    cf
    >> pipe.group_by("user_id")
    >> pipe.summarise(
        mean_score=("score", "mean"),
        n=("score", "count"),
    )
).to_pandas()
result

Output

   user_id  mean_score  n
0        1         7.0  2
1        2         7.0  1

🔢 count(), Proportions, Row Counting

(From test_count_prop.py)

count():

  • with no arguments → counts rows
  • with columns → frequency tables
  • add prop=True for proportions

Example

cf = df({"grp":[1,1,2,2,2]})
cf.count("grp", prop=True, sort=True).to_pandas()

Output

   grp  n  prop
0    2  3  0.60
1    1  2  0.40

✂️ slice(), head(), tail()

(From test_slice.py)

Example

cf = df({"x":[10,20,30,40]})
cf.slice(1,3).to_pandas()

Output

    x
1  20
2  30

🔄 pivot_longer() and pivot_wider()

(From test_pivot_longer_wider_basic.py, test_tidyr.py)

pivot_longer

cf = df({
    "id":[1,2],
    "year_2023":[10,30],
    "year_2024":[11,31],
})

cf.pivot_longer(
    col.matches("^year_"),
    names_to="year",
    values_to="value",
).to_pandas()

Output

   id       year  value
0   1  year_2023     10
1   2  year_2023     30
2   1  year_2024     11
3   2  year_2024     31

pivot_wider

long = cf.pivot_longer(...)

long.pivot_wider(names_from="year", values_from="value").to_pandas()

Output

   id  year_2023  year_2024
0   1         10         11
1   2         30         31

🔬 separate() & unite() with Proper NA Semantics

(From test_separate_unite.py)

unite()

cf = df({
    "first":["Ada", None, "Charlie"],
    "last":["Lovelace", "Smith", None],
})

cf.unite("full", ["first","last"], sep=" ").to_pandas()

Output

          full
0  Ada Lovelace
1          <NA>
2          <NA>

separate()

cf = df({"full":["Ada Lovelace", "John Smith"]})
cf.separate("full", into=["first","last"], sep=" ").to_pandas()

Output

    first     last
0     Ada  Lovelace
1    John     Smith

📥 Installation

For contributors (local dev)

maturin develop --release

Future PyPI install

pip install crowley-frame

🚀 Usage Overview

Create a DataFrame

from crowley_frame import df, col, pipe
cf = df({"x":[1,2,3], "y":[10,20,30]})

Select columns

cf.select(col.starts_with("y")).to_pandas()

Output:

    y
0  10
1  20
2  30

Mutate

cf.mutate(z="x + y").to_pandas()

Output:

   x   y   z
0  1  10  11
1  2  20  22
2  3  30  33

Group + summarise with pipes

cf >> pipe.group_by("x") >> pipe.summarise(sum_y=("y","sum"))

Output:

   x  sum_y
0  1     10
1  2     20
2  3     30

Reshape: pivot_longer

cf.pivot_longer(col.starts_with("y"), names_to="year", values_to="value")

Output:

   x  year  value
0  1     y1     10
1  1     y2     20

🧭 Roadmap (Next Milestones)

  • More window functions (rolling_sum, rolling_sd, rolling_min/max)
  • Lazy backend (like dplyr/dbplyr or polars-lazy)
  • More expressive mutate expression engine
  • Arrow-native memory and zero-copy interfaces
  • SIMD and GPU-accelerated Rust kernels
  • Better type inference + schema evolution

📄 License

MIT License — free to use, modify, and distribute.


Tidyverse-style data manipulation for Python, powered by Rust and Polars.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crowleyframe-0.1.0.tar.gz (31.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crowleyframe-0.1.0-cp310-cp310-win_amd64.whl (4.8 MB view details)

Uploaded CPython 3.10Windows x86-64

File details

Details for the file crowleyframe-0.1.0.tar.gz.

File metadata

  • Download URL: crowleyframe-0.1.0.tar.gz
  • Upload date:
  • Size: 31.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for crowleyframe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7648c1f82308d1c7c69a9e4d3b0eaca601cb4c1b9c52f028b7e1bafe0859eab2
MD5 0f34e2e367245464a4372aaea8d0571c
BLAKE2b-256 6b87783c9efad8f39fa28d891ac32638b852f1f59ee4195ec48df51a64ba5468

See more details on using hashes here.

File details

Details for the file crowleyframe-0.1.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for crowleyframe-0.1.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 15b0a647ac2b47b97fef39b4d109b4a1656ee15ecacc9bbef63823d929bfb424
MD5 4b7824b15c4fad01405a6fa15ab4a106
BLAKE2b-256 70b314c3383bf36a57d066b3c742e04ed52e95e8afbece6c7d637354467e3e0a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page