Skip to main content

Extremely lightweight compatibility layer between pandas, Polars, cuDF, and Modin

Project description

Narwhals

Extremely lightweight compatibility layer between Polars, pandas, cuDF, and Modin.

Seamlessly support all four, without depending on any of them!

  • Just use a subset of the Polars API, no need to learn anything new
  • No dependencies (not even Polars), keep your library lightweight
  • ✅ Separate Lazy and Eager APIs
  • ✅ Use Polars Expressions API

Note: this is work-in-progress, and a bit of an experiment, don't take it too seriously.

Installation

pip install narwhals

Or just vendor it, it's only a bunch of pure-Python files.

Usage

There are three steps to writing dataframe-agnostic code using Narwhals:

  1. use narwhals.to_polars_api to wrap a pandas, Polars, cuDF, or Modin dataframe in the Polars API

  2. use the subset of the Polars API defined in https://github.com/MarcoGorelli/narwhals/blob/main/narwhals/spec/__init__.py.

  3. use narwhals.to_original_object to return an object to the user in their original dataframe flavour. For example:

    • if you started with pandas, you'll get pandas back
    • if you started with Polars, you'll get Polars back
    • if you started with Modin, you'll get Modin back
    • if you started with cuDF, you'll get cuDF back (and computation will happen natively on the GPU!)

Example

Here's an example of a dataframe agnostic function:

from typing import TypeVar
import pandas as pd
import polars as pl

from narwhals import to_polars_api, to_original_object

AnyDataFrame = TypeVar("AnyDataFrame")


def my_agnostic_function(
    suppliers_native: AnyDataFrame,
    parts_native: AnyDataFrame,
) -> AnyDataFrame:
    suppliers, pl = to_polars_api(suppliers_native, version="0.20")
    parts, _ = to_polars_api(parts_native, version="0.20")
    result = (
        suppliers.join(parts, left_on="city", right_on="city")
        .filter(
            pl.col("color").is_in(["Red", "Green"]),
            pl.col("weight") > 14,
        )
        .group_by("s", "p")
        .agg(
            weight_mean=pl.col("weight").mean(),
            weight_max=pl.col("weight").max(),
        )
    )
    return to_original_object(result.collect())

You can pass in a pandas, Polars, cuDF, or Modin dataframe, the output will be the same! Let's try it out:

suppliers = {
    "s": ["S1", "S2", "S3", "S4", "S5"],
    "sname": ["Smith", "Jones", "Blake", "Clark", "Adams"],
    "status": [20, 10, 30, 20, 30],
    "city": ["London", "Paris", "Paris", "London", "Athens"],
}
parts = {
    "p": ["P1", "P2", "P3", "P4", "P5", "P6"],
    "pname": ["Nut", "Bolt", "Screw", "Screw", "Cam", "Cog"],
    "color": ["Red", "Green", "Blue", "Red", "Blue", "Red"],
    "weight": [12.0, 17.0, 17.0, 14.0, 12.0, 19.0],
    "city": ["London", "Paris", "Oslo", "London", "Paris", "London"],
}

print("pandas output:")
print(
    my_agnostic_function(
        pd.DataFrame(suppliers),
        pd.DataFrame(parts),
    )
)
print("\nPolars output:")
print(
    my_agnostic_function(
        pl.LazyFrame(suppliers),
        pl.LazyFrame(parts),
    )
)
pandas output:
    s   p  weight_mean  weight_max
0  S1  P6         19.0        19.0
1  S2  P2         17.0        17.0
2  S3  P2         17.0        17.0
3  S4  P6         19.0        19.0

Polars output:
shape: (4, 4)
┌─────┬─────┬─────────────┬────────────┐
│ s   ┆ p   ┆ weight_mean ┆ weight_max │
│ --- ┆ --- ┆ ---         ┆ ---        │
│ str ┆ str ┆ f64         ┆ f64        │
╞═════╪═════╪═════════════╪════════════╡
│ S1  ┆ P6  ┆ 19.0        ┆ 19.0       │
│ S3  ┆ P2  ┆ 17.0        ┆ 17.0       │
│ S4  ┆ P6  ┆ 19.0        ┆ 19.0       │
│ S2  ┆ P2  ┆ 17.0        ┆ 17.0       │
└─────┴─────┴─────────────┴────────────┘

Magic! 🪄

Scope

If you maintain a dataframe-consuming library, then any function from the Polars API which you'd like to be able to use is in-scope, so long as it can be supported without too much difficulty for at least pandas, cuDF, and Modin.

Feature requests are more than welcome!

Related Projects

  • This is not Ibis. Narwhals lets each backend do its own optimisations, and only provides a lightweight (~30 kilobytes) compatibility layer with the Polars API. Ibis applies its own optimisations to different backends, is a heavyweight dependency (~400 MB), and defines its own API.

  • This is not intended as a DataFrame Standard. See the Consortium for Python Data API Standards for a more general and more ambitious project. Please only consider using Narwhals if you only need to support Polars and pandas-like dataframes, and specifically want to tap into Polars' lazy and expressions features (which are out of scope for the Consortium's Standard).

Why "Narwhals"?

Because they are so awesome.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

narwhals-0.1.8.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

narwhals-0.1.8-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file narwhals-0.1.8.tar.gz.

File metadata

  • Download URL: narwhals-0.1.8.tar.gz
  • Upload date:
  • Size: 18.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for narwhals-0.1.8.tar.gz
Algorithm Hash digest
SHA256 6fea7b7c744fcfb56ae9ee7c556038aff2af18446869fc95e6e1319b3a2f82f4
MD5 412cf7796945ace4af0dc2a704b13cda
BLAKE2b-256 7844fea5714b02b80a3bb07a5db0cd8f8080ce5080636248e073d42de89c7e8f

See more details on using hashes here.

File details

Details for the file narwhals-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: narwhals-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for narwhals-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 8032d0db3d12ae576d1dc6d6c7665115a9c628d58bf86d26b463fb74b2ab31ed
MD5 4c415f89938fe412a5433c2874a128d6
BLAKE2b-256 bdb5a3bcfebf0473816ba3c460cf2d3002bd2edcb89cd4f192393a6abd5de988

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page