nitro-pandas

A pandas-like API wrapper around Polars for high-performance data manipulation

These details have not been verified by PyPI

Project description

nitro-pandas Logo

A high-performance pandas-like DataFrame library powered by Polars

Combine the familiar pandas API with Polars' blazing-fast performance

✨ Features

🐼 Pandas-like API - Use familiar pandas syntax without learning a new library
⚡ Polars Backend - Leverage Polars' optimized engine for maximum performance
🔄 Lazy Evaluation - Optimize queries with lazy operations before execution
📊 Comprehensive I/O - Read/write CSV, Parquet, JSON, and Excel files
🎯 Automatic Fallback - Seamless fallback to pandas for unimplemented methods
🔧 Type Safety - Support for pandas-like type casting and schema inference

🎯 Why nitro-pandas?

nitro-pandas bridges the gap between pandas' user-friendly API and Polars' exceptional performance. If you're familiar with pandas but need better performance, nitro-pandas is the perfect solution.

Performance Comparison

Operation	pandas	nitro-pandas (Polars)	Speedup
Large CSV Read	10s	2s	5x faster
GroupBy Aggregation	5s	0.5s	10x faster
Filter Operations	3s	0.3s	10x faster

Results may vary based on data size and hardware

📦 Installation

# Using uv (recommended)
uv add nitro-pandas

# Using pip
pip install nitro-pandas

Requirements

Python 3.11+
Dependencies (automatically installed):
- polars>=1.30.0 - High-performance DataFrame engine
- pandas>=2.2.3 - For fallback methods
- fastexcel>=0.7.0 - Fast Excel reading
- openpyxl>=3.1.5 - Excel file support
- pyarrow>=20.0.0 - Parquet file support

🚀 Quick Start

Basic Usage

import nitro_pandas as npd

# Create a DataFrame (pandas-like syntax)
df = npd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['Paris', 'London', 'New York']
})

# Access columns (returns pandas Series for compatibility)
ages = df['age']
print(ages > 30)  # Boolean Series

# Filter data
filtered = df.loc[df['age'] > 30]
print(filtered)

Reading Files

# Read CSV
df = npd.read_csv('data.csv')

# Read with lazy evaluation (optimized for large files)
lf = npd.read_csv_lazy('large_data.csv')
df = lf.filter(lf['id'] > 1000).collect()

# Read other formats
df_parquet = npd.read_parquet('data.parquet')
df_excel = npd.read_excel('data.xlsx')
df_json = npd.read_json('data.json')

Data Operations

# GroupBy operations (pandas-like syntax, Polars backend)
result = df.groupby('city')['age'].mean()
print(result)

# Multi-column groupby
result = df.groupby(['city', 'category'])['value'].sum()

# Aggregations with dictionaries
result = df.groupby('category').agg({
    'value': 'mean',
    'count': 'sum'
})

# Sorting and filtering
df_sorted = df.sort_values('age', ascending=False)
df_filtered = df.query("age > 25 and city == 'Paris'")

Writing Files

# Write to various formats
df.to_csv('output.csv')
df.to_parquet('output.parquet')
df.to_json('output.json')
df.to_excel('output.xlsx')

📚 API Reference

DataFrame Operations

Creation

# From dictionary
df = npd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})

# From Polars DataFrame
df = npd.DataFrame(pl.DataFrame({'a': [1, 2, 3]}))

# Empty DataFrame
df = npd.DataFrame()

Indexing

# Column selection
df['column_name']  # Returns pandas Series
df[['col1', 'col2']]  # Returns DataFrame

# Boolean filtering
df[df['age'] > 30]  # Returns DataFrame

# Label-based indexing
df.loc[df['age'] > 30, 'name']  # Returns Series
df.loc[0:5, ['name', 'age']]  # Returns DataFrame

# Position-based indexing
df.iloc[0:5, 0:2]  # Returns DataFrame

Transformations

# Type casting (pandas-like types)
df = df.astype({'id': 'int64', 'name': 'str'})

# Rename columns
df = df.rename(columns={'old_name': 'new_name'})

# Drop rows/columns
df = df.drop(labels=[0, 1], axis=0)  # Drop rows
df = df.drop(labels=['col1'], axis=1)  # Drop columns

# Fill null values
df = df.fillna({'column': 0})

# Sort values
df = df.sort_values('age', ascending=False)

I/O Functions

CSV

# Eager reading
df = npd.read_csv('file.csv', 
                  sep=',',
                  usecols=['col1', 'col2'],
                  dtype={'id': 'int64'})

# Lazy reading
lf = npd.read_csv_lazy('file.csv', n_rows=1000)
df = lf.collect()

Parquet

# Eager reading
df = npd.read_parquet('file.parquet',
                      columns=['col1', 'col2'],
                      n_rows=1000)

# Lazy reading
lf = npd.read_parquet_lazy('file.parquet')
df = lf.collect()

Excel

# Eager reading
df = npd.read_excel('file.xlsx',
                    sheet_name=0,
                    usecols=['col1', 'col2'],
                    nrows=1000)

# Lazy reading
lf = npd.read_excel_lazy('file.xlsx', sheet_name='Sheet1')
df = lf.collect()

JSON

# Eager reading
df = npd.read_json('file.json',
                   dtype={'id': 'int64'},
                   n_rows=1000)

# Lazy reading
lf = npd.read_json_lazy('file.json', lines=True)
df = lf.collect()

LazyFrame Operations

# Create lazy frame
lf = npd.read_csv_lazy('large_file.csv')

# Chain operations (optimized before execution)
result = (lf
          .filter(lf['age'] > 30)
          .groupby('city')
          .agg({'value': 'mean'})
          .sort_values('value', ascending=False))

# Execute query
df = result.collect()

🔄 Migration from pandas

Migrating from pandas to nitro-pandas is straightforward:

# Before (pandas)
import pandas as pd
df = pd.read_csv('data.csv')
result = df.groupby('category')['value'].mean()

# After (nitro-pandas)
import nitro_pandas as npd
df = npd.read_csv('data.csv')
result = df.groupby('category')['value'].mean()

Most pandas operations work the same way! The main differences:

Single column selection (df['col']) returns a pandas Series for compatibility
Comparison operations (df > 2) return pandas DataFrames for boolean indexing
Unimplemented methods automatically fall back to pandas

🎓 Examples

Example 1: Data Analysis Pipeline

import nitro_pandas as npd

# Load data
df = npd.read_csv('sales.csv')

# Clean data
df = df.dropna(subset=['amount'])
df = df.astype({'amount': 'float64'})

# Analyze
summary = df.groupby('region').agg({
    'amount': 'sum',
    'orders': 'count'
})

# Filter top regions
top_regions = summary.sort_values('amount', ascending=False).head(10)

# Export
top_regions.to_excel('top_regions.xlsx')

Example 2: Large File Processing

import nitro_pandas as npd

# Use lazy evaluation for large files
lf = npd.read_csv_lazy('huge_file.csv')

# Build optimized query
result = (lf
          .filter(lf['date'] > '2024-01-01')
          .groupby('category')
          .agg({'sales': 'sum', 'orders': 'count'})
          .sort_values('sales', ascending=False))

# Execute only when needed
df = result.collect()
print(df)

Example 3: Complex GroupBy

import nitro_pandas as npd

df = npd.DataFrame({
    'city': ['Paris', 'Paris', 'Lyon', 'Lyon'],
    'category': ['A', 'B', 'A', 'B'],
    'revenue': [1000, 2000, 1500, 1800]
})

# Multi-column groupby
result = df.groupby(['city', 'category'])['revenue'].sum()
print(result)

# Dictionary-based aggregation
result = df.groupby('city').agg({
    'revenue': 'mean',
    'category': 'count'
})
print(result)

🏗️ Project Structure

nitro-pandas/
├── nitro_pandas/
│   ├── __init__.py          # Package initialization
│   ├── dataframe.py         # DataFrame implementation
│   ├── lazyframe.py         # LazyFrame implementation
│   └── io/
│       ├── __init__.py      # IO module exports
│       ├── csv.py           # CSV I/O
│       ├── parquet.py       # Parquet I/O
│       ├── json.py          # JSON I/O
│       └── excel.py         # Excel I/O
├── tests/
│   ├── test_dataframe.py    # DataFrame tests
│   ├── test_groupby.py      # GroupBy tests
│   ├── test_io.py           # I/O tests
│   └── helpers.py           # Test utilities
├── pyproject.toml           # Project configuration
└── README.md                 # This file

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Development Setup

# Clone repository
git clone https://github.com/yourusername/nitro-pandas.git
cd nitro-pandas

# Install development dependencies
uv sync --dev

# Run tests
uv run python tests/test_runner.py

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

The MIT License is a permissive open-source license that allows anyone to:

✅ Use the software for any purpose (commercial or personal)
✅ Modify the software
✅ Distribute the software
✅ Sublicense the software

In short: Everyone can use it freely!

🙏 Acknowledgments

Polars - For the high-performance DataFrame engine
pandas - For the API inspiration and fallback support

📧 Contact

For questions, suggestions, or support, please open an issue on GitHub.

Made with ❤️ for the Python data science community

⭐ Star this repo if you find it useful!

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.6

Apr 5, 2026

0.1.5

Jan 27, 2026

0.1.4

Nov 14, 2025

0.1.3

Nov 14, 2025

0.1.2

Nov 10, 2025

This version

0.1.1

Nov 10, 2025

0.1.0

Nov 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nitro_pandas-0.1.1.tar.gz (119.9 kB view details)

Uploaded Nov 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nitro_pandas-0.1.1-py3-none-any.whl (25.7 kB view details)

Uploaded Nov 10, 2025 Python 3

File details

Details for the file nitro_pandas-0.1.1.tar.gz.

File metadata

Download URL: nitro_pandas-0.1.1.tar.gz
Upload date: Nov 10, 2025
Size: 119.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for nitro_pandas-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`21c5870a7ba45ba67b7c8b5abcdc439e6f6b160d228c40480ae54bfc39cbea31`
MD5	`f51563676e994717c7b0d25e0d331aa4`
BLAKE2b-256	`cb8d7539c637bde843b68a6a817ce9e50f66f81a5652e3604b76d88d3c8d99c4`

See more details on using hashes here.

File details

Details for the file nitro_pandas-0.1.1-py3-none-any.whl.

File metadata

Download URL: nitro_pandas-0.1.1-py3-none-any.whl
Upload date: Nov 10, 2025
Size: 25.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for nitro_pandas-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7cacefefd72b566082cb9c72e8ee066a722d5c9b533a1e33bcf09e65b115dc13`
MD5	`aaa8ff1a86c727c5daca2b1b0ac0c30d`
BLAKE2b-256	`747ef5555f01df92c2ea2822e7a4be4cd2e3c3429e9b202787b556c62bceae1c`

See more details on using hashes here.

nitro-pandas 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

✨ Features

🎯 Why nitro-pandas?

Performance Comparison

📦 Installation

Requirements

🚀 Quick Start

Basic Usage

Reading Files

Data Operations

Writing Files

📚 API Reference

DataFrame Operations

Creation

Indexing

Transformations

I/O Functions

CSV

Parquet

Excel

JSON

LazyFrame Operations

🔄 Migration from pandas

🎓 Examples

Example 1: Data Analysis Pipeline

Example 2: Large File Processing

Example 3: Complex GroupBy

🏗️ Project Structure

🤝 Contributing

Development Setup

📝 License

🙏 Acknowledgments

📧 Contact

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes