Skip to main content

Extremely lightweight compatibility layer between pandas, Polars, cuDF, and Modin

Project description

Narwhals

Extremely lightweight compatibility layer between Polars, pandas, cuDF, and Modin.

Seamlessly support all four, without depending on any of them!

  • Just use a subset of the Polars API, no need to learn anything new
  • No dependencies (not even Polars), keep your library lightweight
  • ✅ Separate Lazy and Eager APIs
  • ✅ Use Polars Expressions

Note: this is work-in-progress, and a bit of an experiment, don't take it too seriously.

Installation

pip install narwhals

Or just vendor it, it's only a bunch of pure-Python files.

Usage

There are three steps to writing dataframe-agnostic code using Narwhals:

  1. use narwhals.to_polars_api to wrap a pandas, Polars, cuDF, or Modin dataframe in the Polars API

  2. use the subset of the Polars API defined in https://github.com/MarcoGorelli/narwhals/blob/main/narwhals/spec/__init__.py.

  3. use narwhals.to_original_object to return an object to the user in their original dataframe flavour. For example:

    • if you started with pandas, you'll get pandas back
    • if you started with Polars, you'll get Polars back
    • if you started with Modin, you'll get Modin back
    • if you started with cuDF, you'll get cuDF back (and computation will happen natively on the GPU!)

Example

Here's an example of a dataframe agnostic function:

from typing import TypeVar
import pandas as pd
import polars as pl

from narwhals import to_polars_api, to_original_object

AnyDataFrame = TypeVar("AnyDataFrame")


def my_agnostic_function(
    suppliers_native: AnyDataFrame,
    parts_native: AnyDataFrame,
) -> AnyDataFrame:
    suppliers, pl = to_polars_api(suppliers_native, version="0.20")
    parts, _ = to_polars_api(parts_native, version="0.20")
    result = (
        suppliers.join(parts, left_on="city", right_on="city")
        .filter(
            pl.col("color").is_in(["Red", "Green"]),
            pl.col("weight") > 14,
        )
        .group_by("s", "p")
        .agg(
            weight_mean=pl.col("weight").mean(),
            weight_max=pl.col("weight").max(),
        )
    )
    return to_original_object(result.collect())

You can pass in a pandas, Polars, cuDF, or Modin dataframe, the output will be the same! Let's try it out:

suppliers = {
    "s": ["S1", "S2", "S3", "S4", "S5"],
    "sname": ["Smith", "Jones", "Blake", "Clark", "Adams"],
    "status": [20, 10, 30, 20, 30],
    "city": ["London", "Paris", "Paris", "London", "Athens"],
}
parts = {
    "p": ["P1", "P2", "P3", "P4", "P5", "P6"],
    "pname": ["Nut", "Bolt", "Screw", "Screw", "Cam", "Cog"],
    "color": ["Red", "Green", "Blue", "Red", "Blue", "Red"],
    "weight": [12.0, 17.0, 17.0, 14.0, 12.0, 19.0],
    "city": ["London", "Paris", "Oslo", "London", "Paris", "London"],
}

print("pandas output:")
print(
    my_agnostic_function(
        pd.DataFrame(suppliers),
        pd.DataFrame(parts),
    )
)
print("\nPolars output:")
print(
    my_agnostic_function(
        pl.LazyFrame(suppliers),
        pl.LazyFrame(parts),
    )
)
pandas output:
    s   p  weight_mean  weight_max
0  S1  P6         19.0        19.0
1  S2  P2         17.0        17.0
2  S3  P2         17.0        17.0
3  S4  P6         19.0        19.0

Polars output:
shape: (4, 4)
┌─────┬─────┬─────────────┬────────────┐
│ s   ┆ p   ┆ weight_mean ┆ weight_max │
│ --- ┆ --- ┆ ---         ┆ ---        │
│ str ┆ str ┆ f64         ┆ f64        │
╞═════╪═════╪═════════════╪════════════╡
│ S1  ┆ P6  ┆ 19.0        ┆ 19.0       │
│ S3  ┆ P2  ┆ 17.0        ┆ 17.0       │
│ S4  ┆ P6  ┆ 19.0        ┆ 19.0       │
│ S2  ┆ P2  ┆ 17.0        ┆ 17.0       │
└─────┴─────┴─────────────┴────────────┘

Magic! 🪄

Scope

  • Do you maintain a dataframe-consuming library?
  • Is there a Polars function which you'd like Narwhals to have, which would make your job easier?

If, I'd love to hear from you!

Note: this is not a "Dataframe Standard" project. It just translates a subset of the Polars API to pandas-like libraries.

Why "Narwhals"?

Because they are so awesome.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

narwhals-0.1.12.tar.gz (128.5 kB view details)

Uploaded Source

Built Distribution

narwhals-0.1.12-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file narwhals-0.1.12.tar.gz.

File metadata

  • Download URL: narwhals-0.1.12.tar.gz
  • Upload date:
  • Size: 128.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for narwhals-0.1.12.tar.gz
Algorithm Hash digest
SHA256 1bea23e4099d0e66a1d02c28372757474411b76efc7947ed78821ac415ab4a90
MD5 4c60db7c4eb256dfb97e629a40c7b5c1
BLAKE2b-256 abf9aaec68b00c00ef7b09f2ac3ce8887ac6de39e599fc739a0ba4acbf10acfc

See more details on using hashes here.

File details

Details for the file narwhals-0.1.12-py3-none-any.whl.

File metadata

  • Download URL: narwhals-0.1.12-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for narwhals-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 c8782ec24fbe52dc7eb0b484e45d5ee0ec76689c5482b7fed0cd5f699b9361b4
MD5 87db8c5a548fceedeaa16f35669b7b2c
BLAKE2b-256 424ee2ae537575cda9d7f3df6e9f3660fbf9425160a5981fb3069b238e073f32

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page