Extremely lightweight compatibility layer between pandas, Polars, cuDF, and Modin
Project description
Narwhals
Extremely lightweight compatibility layer between Polars, pandas, cuDF, and Modin.
Seamlessly support all four, without depending on any of them!
- ✅ Just use a subset of the Polars API, no need to learn anything new
- ✅ No dependencies (not even Polars), keep your library lightweight
- ✅ Separate Lazy and Eager APIs
- ✅ Use Polars Expressions API
Note: this is work-in-progress, and a bit of an experiment, don't take it too seriously.
Installation
pip install narwhals
Or just vendor it, it's only a bunch of pure-Python files.
Usage
There are three steps to writing dataframe-agnostic code using Narwhals:
-
use
narwhals.to_polars_api
to wrap a pandas, Polars, cuDF, or Modin dataframe in the Polars API -
use the subset of the Polars API defined in https://github.com/MarcoGorelli/narwhals/blob/main/narwhals/spec/__init__.py.
-
use
narwhals.to_original_object
to return an object to the user in their original dataframe flavour. For example:- if you started with pandas, you'll get pandas back
- if you started with Polars, you'll get Polars back
- if you started with Modin, you'll get Modin back
- if you started with cuDF, you'll get cuDF back (and computation will happen natively on the GPU!)
Example
Here's an example of a dataframe agnostic function:
from typing import TypeVar
import pandas as pd
import polars as pl
from narwhals import to_polars_api, to_original_object
AnyDataFrame = TypeVar("AnyDataFrame")
def my_agnostic_function(
suppliers_native: AnyDataFrame,
parts_native: AnyDataFrame,
) -> AnyDataFrame:
suppliers, pl = to_polars_api(suppliers_native, version="0.20")
parts, _ = to_polars_api(parts_native, version="0.20")
result = (
suppliers.join(parts, left_on="city", right_on="city")
.filter(
pl.col("color").is_in(["Red", "Green"]),
pl.col("weight") > 14,
)
.group_by("s", "p")
.agg(
weight_mean=pl.col("weight").mean(),
weight_max=pl.col("weight").max(),
)
)
return to_original_object(result.collect())
You can pass in a pandas, Polars, cuDF, or Modin dataframe, the output will be the same! Let's try it out:
suppliers = {
"s": ["S1", "S2", "S3", "S4", "S5"],
"sname": ["Smith", "Jones", "Blake", "Clark", "Adams"],
"status": [20, 10, 30, 20, 30],
"city": ["London", "Paris", "Paris", "London", "Athens"],
}
parts = {
"p": ["P1", "P2", "P3", "P4", "P5", "P6"],
"pname": ["Nut", "Bolt", "Screw", "Screw", "Cam", "Cog"],
"color": ["Red", "Green", "Blue", "Red", "Blue", "Red"],
"weight": [12.0, 17.0, 17.0, 14.0, 12.0, 19.0],
"city": ["London", "Paris", "Oslo", "London", "Paris", "London"],
}
print("pandas output:")
print(
my_agnostic_function(
pd.DataFrame(suppliers),
pd.DataFrame(parts),
)
)
print("\nPolars output:")
print(
my_agnostic_function(
pl.LazyFrame(suppliers),
pl.LazyFrame(parts),
)
)
pandas output:
s p weight_mean weight_max
0 S1 P6 19.0 19.0
1 S2 P2 17.0 17.0
2 S3 P2 17.0 17.0
3 S4 P6 19.0 19.0
Polars output:
shape: (4, 4)
┌─────┬─────┬─────────────┬────────────┐
│ s ┆ p ┆ weight_mean ┆ weight_max │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 ┆ f64 │
╞═════╪═════╪═════════════╪════════════╡
│ S1 ┆ P6 ┆ 19.0 ┆ 19.0 │
│ S3 ┆ P2 ┆ 17.0 ┆ 17.0 │
│ S4 ┆ P6 ┆ 19.0 ┆ 19.0 │
│ S2 ┆ P2 ┆ 17.0 ┆ 17.0 │
└─────┴─────┴─────────────┴────────────┘
Magic! 🪄
Scope
If you maintain a dataframe-consuming library, then any function from the Polars API which you'd like to be able to use is in-scope, so long as it can be supported without too much difficulty for at least pandas, cuDF, and Modin.
Feature requests are more than welcome!
Related Projects
-
This is not Ibis. Narwhals lets each backend do its own optimisations, and only provides a lightweight (~30 kilobytes) compatibility layer with the Polars API. Ibis applies its own optimisations to different backends, is a heavyweight dependency (~400 MB), and defines its own API.
-
This is not intended as a DataFrame Standard. See the Consortium for Python Data API Standards for a more general and more ambitious project. Please only consider using Narwhals if you only need to support Polars and pandas-like dataframes, and specifically want to tap into Polars' lazy and expressions features (which are out of scope for the Consortium's Standard).
Why "Narwhals"?
Because they are so awesome.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file narwhals-0.1.8.tar.gz
.
File metadata
- Download URL: narwhals-0.1.8.tar.gz
- Upload date:
- Size: 18.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fea7b7c744fcfb56ae9ee7c556038aff2af18446869fc95e6e1319b3a2f82f4 |
|
MD5 | 412cf7796945ace4af0dc2a704b13cda |
|
BLAKE2b-256 | 7844fea5714b02b80a3bb07a5db0cd8f8080ce5080636248e073d42de89c7e8f |
File details
Details for the file narwhals-0.1.8-py3-none-any.whl
.
File metadata
- Download URL: narwhals-0.1.8-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8032d0db3d12ae576d1dc6d6c7665115a9c628d58bf86d26b463fb74b2ab31ed |
|
MD5 | 4c415f89938fe412a5433c2874a128d6 |
|
BLAKE2b-256 | bdb5a3bcfebf0473816ba3c460cf2d3002bd2edcb89cd4f192393a6abd5de988 |