Skip to main content

Extremely lightweight compatibility layer between pandas, Polars, cuDF, and Modin

Project description

Narwhals

narwhals_small

Extremely lightweight compatibility layer between Polars, pandas, and more.

Seamlessly support both, without depending on either!

  • Just use a subset of the Polars API, no need to learn anything new
  • No dependencies (not even Polars), keep your library lightweight
  • ✅ Separate Lazy and Eager APIs
  • ✅ Use Polars Expressions

Note: this is work-in-progress, and a bit of an experiment, don't take it too seriously.

Installation

pip install narwhals

Or just vendor it, it's only a bunch of pure-Python files.

Usage

There are three steps to writing dataframe-agnostic code using Narwhals:

  1. use narwhals.translate_frame to wrap a pandas or Polars DataFrame to a Narwhals DataFrame

  2. (optional) use narwhals.get_namespace to get a namespace object

  3. use the subset of the Polars API defined in https://github.com/MarcoGorelli/narwhals/blob/main/narwhals/spec/__init__.py. Some methods are only available if you called narwhals.translate_frame with is_eager=True

  4. use narwhals.to_native to return an object to the user in their original dataframe flavour. For example:

    • if you started with pandas, you'll get pandas back
    • if you started with Polars, you'll get Polars back

Example

Here's an example of a dataframe agnostic function:

from typing import TypeVar
import pandas as pd
import polars as pl

from narwhals import translate_frame, get_namespace, to_native

AnyDataFrame = TypeVar("AnyDataFrame")


def my_agnostic_function(
    suppliers_native: AnyDataFrame,
    parts_native: AnyDataFrame,
) -> AnyDataFrame:
    suppliers = translate_frame(suppliers_native)
    parts = translate_frame(parts_native)
    pl = get_namespace(suppliers)

    result = (
        suppliers.join(parts, left_on="city", right_on="city")
        .filter(
            pl.col("color").is_in(["Red", "Green"]),
            pl.col("weight") > 14,
        )
        .group_by("s", "p")
        .agg(
            weight_mean=pl.col("weight").mean(),
            weight_max=pl.col("weight").max(),
        )
    )
    return to_native(result)

You can pass in a pandas or Polars dataframe, the output will be the same! Let's try it out:

suppliers = {
    "s": ["S1", "S2", "S3", "S4", "S5"],
    "sname": ["Smith", "Jones", "Blake", "Clark", "Adams"],
    "status": [20, 10, 30, 20, 30],
    "city": ["London", "Paris", "Paris", "London", "Athens"],
}
parts = {
    "p": ["P1", "P2", "P3", "P4", "P5", "P6"],
    "pname": ["Nut", "Bolt", "Screw", "Screw", "Cam", "Cog"],
    "color": ["Red", "Green", "Blue", "Red", "Blue", "Red"],
    "weight": [12.0, 17.0, 17.0, 14.0, 12.0, 19.0],
    "city": ["London", "Paris", "Oslo", "London", "Paris", "London"],
}

print("pandas output:")
print(
    my_agnostic_function(
        pd.DataFrame(suppliers),
        pd.DataFrame(parts),
    )
)
print("\nPolars output:")
print(
    my_agnostic_function(
        pl.DataFrame(suppliers),
        pl.DataFrame(parts),
    )
)
print("\nPolars lazy output:")
print(
    my_agnostic_function(
        pl.LazyFrame(suppliers),
        pl.LazyFrame(parts),
    ).collect()
)
pandas output:
    s   p  weight_mean
0  S1  P6         19.0
1  S2  P2         17.0
2  S3  P2         17.0
3  S4  P6         19.0

Polars output:
shape: (4, 3)
┌─────┬─────┬─────────────┐
│ s   ┆ p   ┆ weight_mean │
│ --- ┆ --- ┆ ---         │
│ str ┆ str ┆ f64         │
╞═════╪═════╪═════════════╡
│ S1  ┆ P6  ┆ 19.0        │
│ S3  ┆ P2  ┆ 17.0        │
│ S4  ┆ P6  ┆ 19.0        │
│ S2  ┆ P2  ┆ 17.0        │
└─────┴─────┴─────────────┘

Polars lazy output:
shape: (4, 3)
┌─────┬─────┬─────────────┐
│ s   ┆ p   ┆ weight_mean │
│ --- ┆ --- ┆ ---         │
│ str ┆ str ┆ f64         │
╞═════╪═════╪═════════════╡
│ S1  ┆ P6  ┆ 19.0        │
│ S3  ┆ P2  ┆ 17.0        │
│ S4  ┆ P6  ┆ 19.0        │
│ S2  ┆ P2  ┆ 17.0        │
└─────┴─────┴─────────────┘

Magic! 🪄

Scope

  • Do you maintain a dataframe-consuming library?
  • Is there a Polars function which you'd like Narwhals to have, which would make your job easier?

If, I'd love to hear from you!

Note: You might suspect that this is a secret ploy to infiltrate the Polars API everywhere. Indeed, you may suspect that.

Why "Narwhals"?

Because they are so awesome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

narwhals-0.3.0.tar.gz (131.6 kB view hashes)

Uploaded Source

Built Distribution

narwhals-0.3.0-py3-none-any.whl (25.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page