nitro-pandas

A pandas-like API wrapper around Polars for high-performance data manipulation

These details have not been verified by PyPI

Project description

nitro-pandas Logo

A high-performance pandas-like DataFrame library powered by Polars

Combine the familiar pandas API with Polars' blazing-fast performance

✨ Features

🐼 Pandas-like API — Use familiar pandas syntax without learning a new library
⚡ Polars Backend — Leverage Polars' optimized Rust engine for maximum performance
📊 Comprehensive I/O — Read/write CSV, Parquet, JSON, and Excel files
🎯 Automatic Fallback — Seamless fallback to pandas for unimplemented methods
🔬 Built-in Profiler — Line-by-line comparison of your pandas code vs nitro-pandas with profile_compare
🔧 Type Safety — Support for pandas-like type casting and schema inference

🎯 Why nitro-pandas?

nitro-pandas bridges the gap between pandas' user-friendly API and Polars' exceptional performance. Replace import pandas as pd with import nitro_pandas as npd and get faster code without changing anything else.

Performance Comparison

Benchmarked on the Books Rating dataset (~3M rows, 10 columns) using npd.profile_compare. All times are averaged wall-clock seconds.

Operation	pandas	nitro-pandas	Speedup
Read CSV	13.04s	1.09s	12.0x ↑
Rename columns	0.11s	0.001s	80x ↑
Drop duplicates	0.52s	0.42s	1.2x ↑
Filter (`df[df["Price"] > 0]`)	0.028s	0.009s	3.1x ↑
Chained filters	0.008–0.014s	0.001–0.002s	5–8x ↑
GroupBy + mean	0.030s	0.004s	7.5x ↑
GroupBy + count	0.032s	0.004s	8.9x ↑
nlargest (top-N)	0.027–0.029s	0.007–0.009s	3–4x ↑
String filter (`str.contains`)	0.15–0.20s	0.02–0.03s	5–9x ↑
pivot_table	0.008s	0.003s	2.8x ↑
sample (50k rows)	0.010s	0.001s	8.0x ↑
describe	0.012s	0.003s	4.2x ↑
sort_values	0.041s	0.018s	2.3x ↑
TOTAL pipeline	14.47s	1.68s	8.6x ↑

Summary: 8.6x overall speedup on a realistic 13-step production pipeline. The biggest gains are on I/O, string operations, groupby, and sampling — the operations that dominate real-world workloads.

Results may vary based on data size and hardware.

📦 Installation

# Using uv (recommended)
uv add nitro-pandas

# Using pip
pip install nitro-pandas

Requirements

Python 3.11+
Dependencies (automatically installed):
- polars>=1.30.0 — High-performance DataFrame engine
- pandas>=2.2.3 — For fallback methods
- line-profiler>=5.0.2 — For profile_compare
- fastexcel>=0.7.0 — Fast Excel reading
- openpyxl>=3.1.5 — Excel file support
- pyarrow>=20.0.0 — Parquet file support

🚀 Quick Start

Basic Usage

import nitro_pandas as npd

# Create a DataFrame (pandas-like syntax)
df = npd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['Paris', 'London', 'New York']
})

# Filter data
filtered = df[df['age'] > 30]

# GroupBy
result = df.groupby('city')['age'].mean()

Reading Files

# Read CSV
df = npd.read_csv('data.csv')

# Read with lazy evaluation (optimized for large files)
lf = npd.read_csv_lazy('large_data.csv')
df = lf.query('id > 1000').collect()

# Other formats
df_parquet = npd.read_parquet('data.parquet')
df_excel   = npd.read_excel('data.xlsx')
df_json    = npd.read_json('data.json')

Data Operations

# GroupBy operations
result = df.groupby('city')['age'].mean()
result = df.groupby(['city', 'category'])['value'].sum()
result = df.groupby('category').agg({'value': 'mean', 'count': 'sum'})

# Sorting, filtering, sampling
df_sorted   = df.sort_values('age', ascending=False)
df_filtered = df.query("age > 25 and city == 'Paris'")
df_sample   = df.sample(n=1000, random_state=42)

# Top-N rows
top10 = df.nlargest(10, 'age')

# Pivot table
pivot = df.pivot_table(values='age', index='city', aggfunc='mean')

# Summary statistics
df.describe()
df.std()
df.median()
df.corr()

Writing Files

df.to_csv('output.csv')
df.to_parquet('output.parquet')
df.to_json('output.json')
df.to_excel('output.xlsx')

🔬 profile_compare

profile_compare runs your pandas code line-by-line under both backends and reports the speedup per line — so you know exactly where nitro-pandas helps.

import nitro_pandas as npd

def my_pipeline(pd):
    df = pd.read_csv("data.csv")
    df = df.rename(columns={"review/score": "score"})
    result = df.groupby("Id")["score"].mean()
    return result

print(npd.profile_compare(my_pipeline))

Output:

------------------------------------------------------------------------------------------
 Line  Source                                               pandas      nitro     Gain  
------------------------------------------------------------------------------------------
    5  df = pd.read_csv("data.csv")                       13.0385s    1.0866s   12.00x  ↑ 
    6  df = df.rename(columns={"review/score": "sco       0.1051s    0.0013s   80.14x  ↑ 
    7  result = df.groupby("Id")["score"].mean()           0.0296s    0.0039s    7.51x  ↑ 
------------------------------------------------------------------------------------------
TOTAL                                                      13.1732s    1.0918s   12.07x
------------------------------------------------------------------------------------------

Options:

npd.profile_compare(
    my_pipeline,
    n_runs=3,           # average over 3 runs
    warmup=1,           # 1 warm-up run discarded
    assert_equal=True,  # raise if results differ between backends
    return_format="dataframe",  # "table" (default) | "dict" | "dataframe"
)

Lines marked ⚠ triggered a pandas fallback — those are candidates for native implementation.

📚 API Reference

DataFrame Operations

Creation

df = npd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df = npd.DataFrame(pl.DataFrame({'a': [1, 2, 3]}))

Indexing

df['column_name']           # Returns nitro-pandas Series
df[['col1', 'col2']]        # Returns DataFrame
df[df['age'] > 30]          # Boolean filtering
df.loc[df['age'] > 30, 'name']
df.iloc[0:5, 0:2]

Transformations

df.astype({'id': 'int64', 'name': 'str'})
df.rename(columns={'old': 'new'})
df.drop(labels=['col1'], axis=1)
df.fillna({'column': 0})
df.sort_values('age', ascending=False)
df.drop_duplicates(subset=['id'])

Aggregations (native, no pandas fallback)

df.describe()
df.std()
df.median()
df.corr()
df.groupby('col')['val'].mean()
df.groupby('col')['val'].count()
df.nlargest(100, 'col')
df.sample(n=1000, random_state=42)
df.pivot_table(values='val', index='col', aggfunc='mean')

I/O Functions

CSV

df = npd.read_csv('file.csv', sep=',', usecols=['col1', 'col2'], dtype={'id': 'int64'})
lf = npd.read_csv_lazy('file.csv', n_rows=1000)

Parquet

df = npd.read_parquet('file.parquet', columns=['col1', 'col2'])
lf = npd.read_parquet_lazy('file.parquet')

Excel

df = npd.read_excel('file.xlsx', sheet_name=0, usecols=['col1'], nrows=1000)
lf = npd.read_excel_lazy('file.xlsx', sheet_name='Sheet1')

JSON

df = npd.read_json('file.json', dtype={'id': 'int64'})
lf = npd.read_json_lazy('file.json', lines=True)

🔄 Migration from pandas

# Before
import pandas as pd
df = pd.read_csv('data.csv')
result = df.groupby('category')['value'].mean()

# After — same code, faster execution
import nitro_pandas as npd
df = npd.read_csv('data.csv')
result = df.groupby('category')['value'].mean()

Key differences to be aware of:

df['col'] returns a nitro-pandas Series (not a pandas Series) — it's compatible with boolean indexing and most pandas operations
No inplace parameter — all operations return new DataFrames
No mixed column types — each column must have a consistent type (Polars requirement)
Unimplemented methods fall back to pandas automatically with a PandasFallbackWarning

import warnings
from nitro_pandas import PandasFallbackWarning

# Silence fallback warnings if needed
warnings.filterwarnings("ignore", category=PandasFallbackWarning)

🏗️ Project Structure

nitro-pandas/
├── nitro_pandas/
│   ├── __init__.py      # Public API
│   ├── dataframe.py     # DataFrame, Series, GroupBy
│   ├── profiling.py     # profile_compare
│   ├── lazyframe.py     # LazyFrame
│   └── io/              # read_csv, read_parquet, read_excel, read_json
├── tests/
│   ├── test_dataframe.py
│   ├── test_profiling.py
│   ├── test_groupby.py
│   ├── test_io.py
│   └── test_runner.py
├── pyproject.toml
└── CHANGELOG.md

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Development Setup

git clone https://github.com/Wassim17Labdi/nitro-pandas.git
cd nitro-pandas
uv sync --dev
uv run python tests/test_runner.py

📝 License

This project is licensed under the MIT License — see the LICENSE file for details.

🙏 Acknowledgments

Polars — For the high-performance DataFrame engine
pandas — For the API inspiration and fallback support

Made with ❤️ for the Python data science community

⭐ Star this repo if you find it useful!

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

May 19, 2026

0.1.6

Apr 5, 2026

0.1.5

Jan 27, 2026

0.1.4

Nov 14, 2025

0.1.3

Nov 14, 2025

0.1.2

Nov 10, 2025

0.1.1

Nov 10, 2025

0.1.0

Nov 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nitro_pandas-0.2.0.tar.gz (1.2 MB view details)

Uploaded May 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nitro_pandas-0.2.0-py3-none-any.whl (33.5 kB view details)

Uploaded May 19, 2026 Python 3

File details

Details for the file nitro_pandas-0.2.0.tar.gz.

File metadata

Download URL: nitro_pandas-0.2.0.tar.gz
Upload date: May 19, 2026
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nitro_pandas-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`3a0b17dda7bb4d9ed100af937d6ceea0d6c79ea6739a7381d8de80f3529d53e6`
MD5	`4dd369125a2e747fa4ab4c546e18834e`
BLAKE2b-256	`a3199dcd12143c7827c409557a1914967d3fffa81fdded5eadf30386e001179f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nitro_pandas-0.2.0.tar.gz:

Publisher: publish.yml on Wassim17Labdi/nitro-pandas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nitro_pandas-0.2.0.tar.gz
- Subject digest: 3a0b17dda7bb4d9ed100af937d6ceea0d6c79ea6739a7381d8de80f3529d53e6
- Sigstore transparency entry: 1575551705
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: Wassim17Labdi/nitro-pandas@a0f8e4f7b09c2f043e04bf953e65513c75faa6a1
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/Wassim17Labdi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a0f8e4f7b09c2f043e04bf953e65513c75faa6a1
- Trigger Event: release

File details

Details for the file nitro_pandas-0.2.0-py3-none-any.whl.

File metadata

Download URL: nitro_pandas-0.2.0-py3-none-any.whl
Upload date: May 19, 2026
Size: 33.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nitro_pandas-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d066ebf9cf8f6495a20f51d6faf3fff7fd51a8d7520e04b98f4c8a5485066de9`
MD5	`ed3472da4be0b4d18f1eac43498d4551`
BLAKE2b-256	`20da40b85bc87e978f70c0bd44f404bb315dcd5a5fd733a9a3ce92b012e8301a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nitro_pandas-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Wassim17Labdi/nitro-pandas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nitro_pandas-0.2.0-py3-none-any.whl
- Subject digest: d066ebf9cf8f6495a20f51d6faf3fff7fd51a8d7520e04b98f4c8a5485066de9
- Sigstore transparency entry: 1575551743
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: Wassim17Labdi/nitro-pandas@a0f8e4f7b09c2f043e04bf953e65513c75faa6a1
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/Wassim17Labdi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a0f8e4f7b09c2f043e04bf953e65513c75faa6a1
- Trigger Event: release

nitro-pandas 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

✨ Features

🎯 Why nitro-pandas?

Performance Comparison

📦 Installation

Requirements

🚀 Quick Start

Basic Usage

Reading Files

Data Operations

Writing Files

🔬 profile_compare

📚 API Reference

DataFrame Operations

Creation

Indexing

Transformations

Aggregations (native, no pandas fallback)

I/O Functions

CSV

Parquet

Excel

JSON

🔄 Migration from pandas

🏗️ Project Structure

🤝 Contributing

Development Setup

📝 License

🙏 Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance