Skip to main content

Extremely lightweight compatibility layer between pandas, Polars, cuDF, and Modin

Project description

Narwhals

narwhals_small

Extremely lightweight compatibility layer between Polars, pandas, and more.

Seamlessly support both, without depending on either!

  • Just use a subset of the Polars API, no need to learn anything new
  • No dependencies (not even Polars), keep your library lightweight
  • ✅ Separate Lazy and Eager APIs
  • ✅ Use Polars Expressions

Note: this is work-in-progress, and a bit of an experiment, don't take it too seriously.

Installation

pip install narwhals

Or just vendor it, it's only a bunch of pure-Python files.

Usage

There are three steps to writing dataframe-agnostic code using Narwhals:

  1. use narwhals.translate_frame to wrap a pandas or Polars DataFrame to a Narwhals DataFrame

  2. (optional) use narwhals.get_namespace to get a namespace object

  3. use the subset of the Polars API defined in https://github.com/MarcoGorelli/narwhals/blob/main/narwhals/spec/__init__.py. Some methods are only available if you called narwhals.translate_frame with is_eager=True

  4. use narwhals.to_native to return an object to the user in their original dataframe flavour. For example:

    • if you started with pandas, you'll get pandas back
    • if you started with Polars, you'll get Polars back

Example

Here's an example of a dataframe agnostic function:

from typing import TypeVar
import pandas as pd
import polars as pl

from narwhals import translate_frame, get_namespace, to_native

AnyDataFrame = TypeVar("AnyDataFrame")


def my_agnostic_function(
    suppliers_native: AnyDataFrame,
    parts_native: AnyDataFrame,
) -> AnyDataFrame:
    suppliers = translate_frame(suppliers_native)
    parts = translate_frame(parts_native)
    pl = get_namespace(suppliers)

    result = (
        suppliers.join(parts, left_on="city", right_on="city")
        .filter(
            pl.col("color").is_in(["Red", "Green"]),
            pl.col("weight") > 14,
        )
        .group_by("s", "p")
        .agg(
            weight_mean=pl.col("weight").mean(),
            weight_max=pl.col("weight").max(),
        )
    )
    return to_native(result)

You can pass in a pandas or Polars dataframe, the output will be the same! Let's try it out:

suppliers = {
    "s": ["S1", "S2", "S3", "S4", "S5"],
    "sname": ["Smith", "Jones", "Blake", "Clark", "Adams"],
    "status": [20, 10, 30, 20, 30],
    "city": ["London", "Paris", "Paris", "London", "Athens"],
}
parts = {
    "p": ["P1", "P2", "P3", "P4", "P5", "P6"],
    "pname": ["Nut", "Bolt", "Screw", "Screw", "Cam", "Cog"],
    "color": ["Red", "Green", "Blue", "Red", "Blue", "Red"],
    "weight": [12.0, 17.0, 17.0, 14.0, 12.0, 19.0],
    "city": ["London", "Paris", "Oslo", "London", "Paris", "London"],
}

print("pandas output:")
print(
    my_agnostic_function(
        pd.DataFrame(suppliers),
        pd.DataFrame(parts),
    )
)
print("\nPolars output:")
print(
    my_agnostic_function(
        pl.DataFrame(suppliers),
        pl.DataFrame(parts),
    )
)
print("\nPolars lazy output:")
print(
    my_agnostic_function(
        pl.LazyFrame(suppliers),
        pl.LazyFrame(parts),
    ).collect()
)
pandas output:
    s   p  weight_mean
0  S1  P6         19.0
1  S2  P2         17.0
2  S3  P2         17.0
3  S4  P6         19.0

Polars output:
shape: (4, 3)
┌─────┬─────┬─────────────┐
│ s   ┆ p   ┆ weight_mean │
│ --- ┆ --- ┆ ---         │
│ str ┆ str ┆ f64         │
╞═════╪═════╪═════════════╡
│ S1  ┆ P6  ┆ 19.0        │
│ S3  ┆ P2  ┆ 17.0        │
│ S4  ┆ P6  ┆ 19.0        │
│ S2  ┆ P2  ┆ 17.0        │
└─────┴─────┴─────────────┘

Polars lazy output:
shape: (4, 3)
┌─────┬─────┬─────────────┐
│ s   ┆ p   ┆ weight_mean │
│ --- ┆ --- ┆ ---         │
│ str ┆ str ┆ f64         │
╞═════╪═════╪═════════════╡
│ S1  ┆ P6  ┆ 19.0        │
│ S3  ┆ P2  ┆ 17.0        │
│ S4  ┆ P6  ┆ 19.0        │
│ S2  ┆ P2  ┆ 17.0        │
└─────┴─────┴─────────────┘

Magic! 🪄

Scope

  • Do you maintain a dataframe-consuming library?
  • Is there a Polars function which you'd like Narwhals to have, which would make your job easier?

If, I'd love to hear from you!

Note: You might suspect that this is a secret ploy to infiltrate the Polars API everywhere. Indeed, you may suspect that.

Why "Narwhals"?

Because they are so awesome.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

narwhals-0.3.0.tar.gz (131.6 kB view details)

Uploaded Source

Built Distribution

narwhals-0.3.0-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file narwhals-0.3.0.tar.gz.

File metadata

  • Download URL: narwhals-0.3.0.tar.gz
  • Upload date:
  • Size: 131.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for narwhals-0.3.0.tar.gz
Algorithm Hash digest
SHA256 0487ac2a225383b332b6abd1a7b6eafa4b6ddb06d4cfe956ce4742fc888203a3
MD5 b0a2bdfe77ab163d7a661db8ccbd3954
BLAKE2b-256 74467e70b437ed8f58e68f337257faada2b28ede3a10da0931ea7cb0a84d56e3

See more details on using hashes here.

File details

Details for the file narwhals-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: narwhals-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 25.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for narwhals-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b1ebc44ba5f05b598deb966dc0e030f57708cd5531323ec8d15545d4df6e9f7a
MD5 0c541a5284ef286c4f9ee5726bdac90c
BLAKE2b-256 f72688b884c8dbac6435b77796719985dbbff40223bc13f9ee6edfaba4bf9b24

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page