A pandas-like API wrapper around Polars for high-performance data manipulation
Project description
A high-performance pandas-like DataFrame library powered by Polars
Combine the familiar pandas API with Polars' blazing-fast performance
โจ Features
- ๐ผ Pandas-like API - Use familiar pandas syntax without learning a new library
- โก Polars Backend - Leverage Polars' optimized engine for maximum performance
- ๐ Lazy Evaluation - Optimize queries with lazy operations before execution
- ๐ Comprehensive I/O - Read/write CSV, Parquet, JSON, and Excel files
- ๐ฏ Automatic Fallback - Seamless fallback to pandas for unimplemented methods
- ๐ง Type Safety - Support for pandas-like type casting and schema inference
๐ฏ Why nitro-pandas?
nitro-pandas bridges the gap between pandas' user-friendly API and Polars' exceptional performance. If you're familiar with pandas but need better performance, nitro-pandas is the perfect solution.
Performance Comparison
| Operation | pandas | nitro-pandas (Polars) | Speedup |
|---|---|---|---|
| Large CSV Read | 10s | 2s | 5x faster |
| GroupBy Aggregation | 5s | 0.5s | 10x faster |
| Filter Operations | 3s | 0.3s | 10x faster |
Results may vary based on data size and hardware
๐ฆ Installation
# Using uv (recommended)
uv add nitro-pandas
# Using pip
pip install nitro-pandas
Requirements
- Python 3.11+
- Dependencies (automatically installed):
polars>=1.30.0- High-performance DataFrame enginepandas>=2.2.3- For fallback methodsfastexcel>=0.7.0- Fast Excel readingopenpyxl>=3.1.5- Excel file supportpyarrow>=20.0.0- Parquet file support
๐ Quick Start
Basic Usage
import nitro_pandas as npd
# Create a DataFrame (pandas-like syntax)
df = npd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'city': ['Paris', 'London', 'New York']
})
# Access columns (returns pandas Series for compatibility)
ages = df['age']
print(ages > 30) # Boolean Series
# Filter data
filtered = df.loc[df['age'] > 30]
print(filtered)
Reading Files
# Read CSV
df = npd.read_csv('data.csv')
# Read with lazy evaluation (optimized for large files)
lf = npd.read_csv_lazy('large_data.csv')
df = lf.query('id > 1000').collect()
# Read other formats
df_parquet = npd.read_parquet('data.parquet')
df_excel = npd.read_excel('data.xlsx')
df_json = npd.read_json('data.json')
Data Operations
# GroupBy operations (pandas-like syntax, Polars backend)
result = df.groupby('city')['age'].mean()
print(result)
# Multi-column groupby
result = df.groupby(['city', 'category'])['value'].sum()
# Aggregations with dictionaries
result = df.groupby('category').agg({
'value': 'mean',
'count': 'sum'
})
# Sorting and filtering
df_sorted = df.sort_values('age', ascending=False)
df_filtered = df.query("age > 25 and city == 'Paris'")
Writing Files
# Write to various formats
df.to_csv('output.csv')
df.to_parquet('output.parquet')
df.to_json('output.json')
df.to_excel('output.xlsx')
๐ API Reference
DataFrame Operations
Creation
# From dictionary
df = npd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
# From Polars DataFrame
df = npd.DataFrame(pl.DataFrame({'a': [1, 2, 3]}))
# Empty DataFrame
df = npd.DataFrame()
Indexing
# Column selection
df['column_name'] # Returns pandas Series
df[['col1', 'col2']] # Returns DataFrame
# Boolean filtering
df[df['age'] > 30] # Returns DataFrame
# Label-based indexing
df.loc[df['age'] > 30, 'name'] # Returns Series
df.loc[0:5, ['name', 'age']] # Returns DataFrame
# Position-based indexing
df.iloc[0:5, 0:2] # Returns DataFrame
Transformations
# Type casting (pandas-like types)
df = df.astype({'id': 'int64', 'name': 'str'})
# Rename columns
df = df.rename(columns={'old_name': 'new_name'})
# Drop rows/columns
df = df.drop(labels=[0, 1], axis=0) # Drop rows
df = df.drop(labels=['col1'], axis=1) # Drop columns
# Fill null values
df = df.fillna({'column': 0})
# Sort values
df = df.sort_values('age', ascending=False)
I/O Functions
CSV
# Eager reading
df = npd.read_csv('file.csv',
sep=',',
usecols=['col1', 'col2'],
dtype={'id': 'int64'})
# Lazy reading
lf = npd.read_csv_lazy('file.csv', n_rows=1000)
df = lf.collect()
Parquet
# Eager reading
df = npd.read_parquet('file.parquet',
columns=['col1', 'col2'],
n_rows=1000)
# Lazy reading
lf = npd.read_parquet_lazy('file.parquet')
df = lf.collect()
Excel
# Eager reading
df = npd.read_excel('file.xlsx',
sheet_name=0,
usecols=['col1', 'col2'],
nrows=1000)
# Lazy reading
lf = npd.read_excel_lazy('file.xlsx', sheet_name='Sheet1')
df = lf.collect()
JSON
# Eager reading
df = npd.read_json('file.json',
dtype={'id': 'int64'},
n_rows=1000)
# Lazy reading
lf = npd.read_json_lazy('file.json', lines=True)
df = lf.collect()
LazyFrame Operations
# Create lazy frame
lf = npd.read_csv_lazy('large_file.csv')
# Chain operations (optimized before execution)
result = (lf
.query('age > 30')
.groupby('city')
.agg({'value': 'mean'}))
# Execute query
df = result.collect()
# Sort after collection if needed
df = df.sort_values('value', ascending=False)
๐ Migration from pandas
Migrating from pandas to nitro-pandas is straightforward:
# Before (pandas)
import pandas as pd
df = pd.read_csv('data.csv')
result = df.groupby('category')['value'].mean()
# After (nitro-pandas)
import nitro_pandas as npd
df = npd.read_csv('data.csv')
result = df.groupby('category')['value'].mean()
Most pandas operations work the same way! The main differences:
- Single column selection (
df['col']) returns a pandas Series (not a nitro-pandas Series) to maintain compatibility with pandas expressions and boolean indexing - Comparison operations (
df > 2) return pandas DataFrames for boolean indexing compatibility - Unimplemented methods: Automatic fallback to pandas is available at both the DataFrame instance level and the package level:
# โ Works: fallback on DataFrame instance df = npd.DataFrame({'a': [1, 2, 3]}) result = df.describe() # Falls back to pandas DataFrame method # โ Works: fallback at package level import pandas as pd df_pd = pd.DataFrame({'a': [1, 2, 1], 'b': ['x', 'y', 'x']}) result = npd.get_dummies(df_pd) # Falls back to pandas module function result = npd.date_range('2024-01-01', periods=5) # Falls back to pandas
Note: Methods that only exist on DataFrame instances (likedescribe()) are only available via DataFrame instances, not at the package level. - Mixed types in columns: Unlike pandas, Polars (and thus nitro-pandas) does not allow mixed types within a single column. Each column must have a consistent type. If your pandas DataFrame has mixed types in a column, Polars will coerce them to a common type (usually
object/string) or raise an error.# โ This works in pandas but NOT in Polars/nitro-pandas pd.DataFrame({'col': [1, 'text', 3.5]}) # Mixed int, str, float # โ Polars will coerce to string or raise error npd.DataFrame({'col': [1, 'text', 3.5]}) # All values become strings
- No
inplaceparameter: Polars operations are always immutable (return new DataFrames), so nitro-pandas does not support theinplace=Trueparameter found in pandas. All operations return new DataFrame objects.# โ This works in pandas but NOT in nitro-pandas df.drop(columns=['col'], inplace=True) # inplace not supported # โ Always assign the result df = df.drop(labels=['col'], axis=1) # Returns new DataFrame
๐๏ธ Project Structure
nitro-pandas/
โโโ nitro_pandas/
โ โโโ __init__.py # Package initialization
โ โโโ dataframe.py # DataFrame implementation
โ โโโ lazyframe.py # LazyFrame implementation
โ โโโ io/
โ โโโ __init__.py # IO module exports
โ โโโ csv.py # CSV I/O
โ โโโ parquet.py # Parquet I/O
โ โโโ json.py # JSON I/O
โ โโโ excel.py # Excel I/O
โโโ tests/
โ โโโ test_dataframe.py # DataFrame tests
โ โโโ test_groupby.py # GroupBy tests
โ โโโ test_io.py # I/O tests
โ โโโ helpers.py # Test utilities
โโโ pyproject.toml # Project configuration
โโโ README.md # This file
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Development Setup
# Clone repository
git clone https://github.com/yourusername/nitro-pandas.git
cd nitro-pandas
# Install development dependencies
uv sync --dev
# Run tests
uv run python tests/test_runner.py
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
The MIT License is a permissive open-source license that allows anyone to:
- โ Use the software for any purpose (commercial or personal)
- โ Modify the software
- โ Distribute the software
- โ Sublicense the software
In short: Everyone can use it freely!
๐ Acknowledgments
- Polars - For the high-performance DataFrame engine
- pandas - For the API inspiration and fallback support
๐ง Contact
For questions, suggestions, or support, please open an issue on GitHub.
Made with โค๏ธ for the Python data science community
โญ Star this repo if you find it useful!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nitro_pandas-0.1.6.tar.gz.
File metadata
- Download URL: nitro_pandas-0.1.6.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2067b7ea258ec4ae626fa23212594955db21d9b2f950afb22a04e026fcdd332d
|
|
| MD5 |
25ae1b140e3381829e9af6d6c53a42cf
|
|
| BLAKE2b-256 |
c261d21e90528411443c82f9808f77a60a48c3d8ab4de6ee6c29fb5cc25af2ea
|
Provenance
The following attestation bundles were made for nitro_pandas-0.1.6.tar.gz:
Publisher:
publish.yml on Wassim17Labdi/nitro-pandas
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nitro_pandas-0.1.6.tar.gz -
Subject digest:
2067b7ea258ec4ae626fa23212594955db21d9b2f950afb22a04e026fcdd332d - Sigstore transparency entry: 1237790814
- Sigstore integration time:
-
Permalink:
Wassim17Labdi/nitro-pandas@8c4b08ba4992e9e47d6cd7d0e4d2c6878fb021cb -
Branch / Tag:
refs/tags/v0.1.6 - Owner: https://github.com/Wassim17Labdi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8c4b08ba4992e9e47d6cd7d0e4d2c6878fb021cb -
Trigger Event:
release
-
Statement type:
File details
Details for the file nitro_pandas-0.1.6-py3-none-any.whl.
File metadata
- Download URL: nitro_pandas-0.1.6-py3-none-any.whl
- Upload date:
- Size: 28.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a78327228c4eb76eb72d4fcaf1630e5a9d0ea242cf013e79292d53932cb1718
|
|
| MD5 |
1d9924364ba5dbddde67317d93219b1f
|
|
| BLAKE2b-256 |
d2238cce39106c06c607436be1d58be5b93e8c071d9a8a07309bc7a6ff764930
|
Provenance
The following attestation bundles were made for nitro_pandas-0.1.6-py3-none-any.whl:
Publisher:
publish.yml on Wassim17Labdi/nitro-pandas
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nitro_pandas-0.1.6-py3-none-any.whl -
Subject digest:
6a78327228c4eb76eb72d4fcaf1630e5a9d0ea242cf013e79292d53932cb1718 - Sigstore transparency entry: 1237790862
- Sigstore integration time:
-
Permalink:
Wassim17Labdi/nitro-pandas@8c4b08ba4992e9e47d6cd7d0e4d2c6878fb021cb -
Branch / Tag:
refs/tags/v0.1.6 - Owner: https://github.com/Wassim17Labdi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8c4b08ba4992e9e47d6cd7d0e4d2c6878fb021cb -
Trigger Event:
release
-
Statement type: