Skip to main content

A pandas-compatible API layer built on top of Polars for high-performance data manipulation

Project description

๐Ÿผโšก PolarPandas

The fastest pandas-compatible API you'll ever use

Tests Coverage Type Safety Python License

PolarPandas is a blazing-fast, pandas-compatible API built on top of Polars. Write pandas code, get Polars performance. It's that simple.

๐Ÿš€ Why PolarPandas?

Feature pandas PolarPandas Speedup
DataFrame Creation 224.89 ms 15.95 ms โšก 14.1x faster
Read CSV 8.00 ms 0.88 ms โšก 9.1x faster
Sorting 28.05 ms 3.97 ms โšก 7.1x faster
GroupBy 7.95 ms 2.44 ms โšก 3.3x faster
Filtering 1.26 ms 0.42 ms โšก 3.0x faster

๐ŸŽฏ Overall Performance: 5.2x faster than pandas

โœจ Quick Start

import polarpandas as ppd
import polars as pl

# Create a DataFrame (pandas syntax, Polars performance)
df = ppd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "city": ["NYC", "LA", "Chicago"]
})

# All your favorite pandas operations work!
df["age_plus_10"] = df["age"] + 10
df.sort_values("age", inplace=True)
result = df.groupby("city").agg(pl.col("age").mean())

# String operations with .str accessor
df["name_upper"] = df["name"].str.upper()

# Datetime operations with .dt accessor
df["birth_year"] = 2024 - df["age"]

print(df.head())

Output:

shape: (3, 6)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ name    โ”† age โ”† city   โ”† age_plus_10 โ”† name_upper โ”† birth_year โ”‚
โ”‚ ---     โ”† --- โ”† ---     โ”† ---         โ”† ---        โ”† ---        โ”‚
โ”‚ str     โ”† i64 โ”† str     โ”† i64         โ”† str        โ”† i64        โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ Alice   โ”† 25  โ”† NYC     โ”† 35          โ”† ALICE      โ”† 1999       โ”‚
โ”‚ Bob     โ”† 30  โ”† LA      โ”† 40          โ”† BOB        โ”† 1994       โ”‚
โ”‚ Charlie โ”† 35  โ”† Chicago โ”† 45          โ”† CHARLIE    โ”† 1989       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐ŸŽฏ What's New in v0.9.0

โš™๏ธ Rolling Apply Compatibility

  • โœ… DataFrame.rolling().apply now leverages Polars' native rolling_map, so pandas-style custom functions Just Workโ„ข
  • โœ… Full support for raw=True/False, positional args, keyword kwargs, weights, centered windows, and min_periods
  • โœ… More predictable results when mixing numeric and object windows thanks to consistent Series wrapping

๐Ÿงญ GroupBy Reliability

  • โœ… Grouping by missing columns now mirrors pandas: the validation happens at aggregation time and raises a clear KeyError
  • โœ… Safer attribute access on _GroupBy objects, preventing silent failures in chained operations

๐Ÿงช Quality & Tooling

  • โœ… 1,014 tests passing across the suite, including comprehensive rolling-window scenarios
  • โœ… mypy passes cleanly for src/polarpandas, keeping the public API fully typed
  • โœ… ruff check/ruff format run squeaky clean on the updated codebase

๐ŸŽฏ What's New in v0.8.0

๐Ÿ—„๏ธ Enhanced SQL Support

  • โœ… Primary key support - Create SQL tables with single or composite primary keys
  • โœ… Auto-increment columns - Automatic ID generation for primary keys
  • โœ… Advanced to_sql() method - Enhanced DataFrame.to_sql() and Series.to_sql() with:
    • Primary key specification (primary_key parameter)
    • Auto-increment support (auto_increment parameter)
    • Full if_exists options ('fail', 'replace', 'append')
    • Connection string and SQLAlchemy engine support
  • โœ… Type mapping - Automatic Polars to SQL type conversion
  • โœ… Comprehensive SQL utilities - New _sql_utils.py module with SQLAlchemy integration

๐Ÿงช Expanded Test Coverage

  • โœ… 1,026 tests passing - Added 33 comprehensive SQL tests
  • โœ… 88% coverage for SQL utilities - Extensive testing of SQL functionality
  • โœ… Edge case testing - Empty DataFrames, nulls, Unicode, large datasets (10K+ rows)
  • โœ… Data type testing - Integer, float, boolean, date, datetime, and string types
  • โœ… Batch operations - Multiple table operations and transaction testing

๐Ÿ“ฆ New Features

  • โœ… Optional SQLAlchemy dependency - Install with pip install polarpandas[sqlalchemy]
  • โœ… Graceful fallback - Informative error messages when SQLAlchemy not installed
  • โœ… Connection flexibility - Support for connection strings, engines, and connection objects

๐ŸŽฏ What's New in v0.7.0

๐Ÿงช Improved Test Suite

  • โœ… 993 tests passing - Doubled from 498 tests, comprehensive coverage
  • โœ… 48% code coverage - Significant improvement in test coverage
  • โœ… 13 previously skipped tests now passing - Fixed bugs and implemented missing features
  • โœ… No segfaults - Resolved numpy/pandas compatibility issues with Python 3.9+
  • โœ… 72 documented skipped tests - Clear reasons for unimplemented features

๐Ÿ”ง New Features & Bug Fixes

  • โœ… Implemented cut() function - Proper data binning with custom labels support
  • โœ… Fixed Series.sort_index() - Resolved constructor issue
  • โœ… Fixed Series.repeat() - Now works correctly with Polars backend
  • โœ… Fixed Series.where() - Expression evaluation bug resolved
  • โœ… Fixed Series.mask() - Expression evaluation bug resolved

๐Ÿงน Pandas Removal Infrastructure

  • โœ… Test helpers created - Custom assertion utilities replace pandas testing functions
  • โœ… Expected values generator - Generate test expectations without runtime pandas dependency
  • โœ… First file converted - test_dataframe_statistical.py now runs without pandas (79 pandas calls eliminated)
  • โœ… Clear conversion path - Complete documentation and tooling for removing pandas from all tests

๐Ÿ—๏ธ Code Quality

  • โœ… All ruff checks passing - Zero linting errors in src/ and tests/
  • โœ… All pyright checks passing - Zero type errors in new code
  • โœ… Python 3.9+ support - Better compatibility, no segfaults
  • โœ… Comprehensive documentation - Test improvement reports and conversion guides

๐ŸŽฏ What's New in v0.6.0

๐Ÿš€ Massive API Expansion

  • โœ… 619 pandas-compatible features - Comprehensive pandas API coverage
  • โœ… 69 module-level functions - All major pandas functions implemented
  • โœ… 206 DataFrame methods - Complete DataFrame API support
  • โœ… 186 Series methods - Full Series functionality
  • โœ… 73 Index methods - Complete Index operations
  • โœ… 57 String accessor methods - Full .str accessor support
  • โœ… 28 Datetime accessor methods - Comprehensive .dt accessor support
  • โœ… 91 LazyFrame methods - Complete LazyFrame API (262 total methods tracked including pandas DataFrame comparison)

๐Ÿ“Š Enhanced I/O Support

  • โœ… Comprehensive file format support - CSV, JSON, Parquet, Excel, HDF5, HTML, XML, Stata, SPSS, SAS, and more
  • โœ… Enhanced SQL support - Full pandas-compatible to_sql() with primary key and auto-increment support
  • โœ… Optional dependencies - Organized into feature groups (excel, hdf5, html, spss, sas, xarray, clipboard, formatting, sqlalchemy)
  • โœ… Flexible installation - Install only what you need: pip install polarpandas[excel] or pip install polarpandas[all]

๐Ÿš€ Features (from v0.2.0)

  • LazyFrame Class - Optional lazy execution for maximum performance
  • Lazy I/O Operations - scan_csv(), scan_parquet(), scan_json() for lazy loading
  • Complete I/O operations - Full CSV/JSON read/write support
  • Advanced statistical methods - nlargest(), nsmallest(), rank(), diff(), pct_change()
  • String & datetime accessors - Full .str and .dt accessor support
  • Module-level functions - read_csv(), concat(), merge(), get_dummies()
  • Comprehensive edge cases - Empty DataFrames, null values, mixed types
  • Full type annotations - Complete ty type checking support
  • Comprehensive test coverage - Tests for all core functionality and edge cases

๐Ÿ“ฆ Installation

# Install from source (development)
git clone https://github.com/eddiethedean/polarpandas.git
cd polarpandas
pip install -e .

# Or install directly (when published)
pip install polarpandas

# Install with optional features
pip install polarpandas[sqlalchemy]  # For enhanced SQL features (primary keys, auto-increment)
pip install polarpandas[excel]       # For Excel file support
pip install polarpandas[all]         # Install all optional dependencies

Requirements: Python 3.8+ and Polars

Optional Dependencies:

  • numpy - For passing NumPy dtype objects like np.int64 in schemas
  • sqlalchemy - For enhanced SQL features (primary keys, auto-increment in to_sql())
  • pandas - For certain conversion features and compatibility
  • openpyxl, xlsxwriter - For Excel file I/O
  • lxml, html5lib - For HTML/XML parsing
  • pyreadstat, sas7bdat - For SPSS/SAS file support
  • types-tabulate - Lightweight type stubs to keep tabulate-powered helpers mypy-clean
  • And more... see pyproject.toml for complete list

๐Ÿ”ฅ Core Features

โšก Eager vs Lazy Execution

PolarPandas gives you the best of both worlds:

import polarpandas as ppd
import polars as pl

# ๐Ÿš€ EAGER EXECUTION (Default - like pandas)
df = ppd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
result = df.filter(df["a"] > 1)  # Executes immediately
print(result)
# Shows results right away:
# shape: (2, 2)
# โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”
# โ”‚ a   โ”† b   โ”‚
# โ”‚ --- โ”† --- โ”‚
# โ”‚ i64 โ”† i64 โ”‚
# โ•žโ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•ก
# โ”‚ 2   โ”† 5   โ”‚
# โ”‚ 3   โ”† 6   โ”‚
# โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”˜

# โšก LAZY EXECUTION (Optional - for maximum performance)
lf = df.lazy()  # Convert to LazyFrame
lf_filtered = lf.filter(pl.col("a") > 1)  # Stays lazy
df_result = lf_filtered.collect()  # Materialize when ready

# ๐Ÿ“ LAZY I/O (For large files)
lf = ppd.scan_csv("huge_file.csv")  # Lazy loading
lf_processed = lf.filter(pl.col("value") > 100).select("name", "value")
df_final = lf_processed.collect()  # Execute optimized plan

When to use LazyFrame:

  • ๐Ÿ“Š Large datasets (>1M rows)
  • ๐Ÿ”„ Complex operations (multiple filters, joins, aggregations)
  • ๐Ÿ’พ Memory constraints (lazy evaluation uses less memory)
  • โšก Performance critical applications

๐Ÿ“Š DataFrame Operations

# Initialization
df = ppd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})

# Eager I/O (immediate loading)
df = ppd.read_csv("data.csv")
df = ppd.read_json("data.json")
df = ppd.read_parquet("data.parquet")

# Lazy I/O (for large files)
lf = ppd.scan_csv("large_file.csv")
lf = ppd.scan_parquet("huge_file.parquet")
lf = ppd.scan_json("big_file.json")

# Mutable operations (pandas-style)
df["new_col"] = df["A"] * 2
df.drop("old_col", axis=1, inplace=True)
df.rename(columns={"A": "alpha"}, inplace=True)
df.sort_values("B", inplace=True)

# Advanced operations
import polars as pl
df.groupby("category").agg(pl.col("value").mean())  # Use Polars expressions
df.pivot_table(values="sales", index="region", columns="month")
df.rolling(window=3).mean()

๐Ÿ—„๏ธ Enhanced SQL Operations

PolarPandas now supports full pandas-compatible SQL operations with advanced features:

from sqlalchemy import create_engine

# Create database connection
engine = create_engine('sqlite:///mydb.db')

# Basic write (uses Polars' fast write_database)
df = ppd.DataFrame({'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie']})
df.to_sql('users', engine, if_exists='replace')

# Create table with primary key (requires SQLAlchemy)
df.to_sql('users', engine, if_exists='replace', primary_key='id')

# Create table with auto-incrementing primary key
df.to_sql('users', engine, if_exists='replace', 
          primary_key='id', auto_increment=True)

# Composite primary key
df.to_sql('users', engine, if_exists='replace', 
          primary_key=['id', 'email'])

# Read back from SQL
result = ppd.read_sql("SELECT * FROM users WHERE id > 1", engine)

Key Features:

  • ๐Ÿš€ Fast by default - Uses Polars' native write_database() when no special features needed
  • ๐Ÿ”‘ Primary key support - Set single or composite primary keys (requires SQLAlchemy)
  • โšก Auto-increment - Enable auto-incrementing IDs (requires SQLAlchemy)
  • ๐Ÿ”„ Smart fallback - Automatically uses Polars for performance, SQLAlchemy for features
  • โœ… Pandas-compatible - Complete pandas to_sql() signature support

๐Ÿงฉ Schema Conversion (pandas-style to Polars)

PolarPandas accepts schemas in multiple forms and converts them to Polars types automatically:

  • String dtype names: "int64", "float64", "object", "bool", "datetime", "category"
  • NumPy dtypes: np.int64, np.float32, np.uint8, ...
  • pandas dtypes: pd.Int64Dtype(), pd.Float32Dtype(), pd.StringDtype(), ...
  • Polars schema dict or pl.Schema

Constructor usage:

import numpy as np
import polars as pl
import polarpandas as ppd

data = {"a": [1, 2, 3], "b": ["x", "y", "z"]}

# Strings
 df = ppd.DataFrame(data, dtype={"a": "int64", "b": "string"})

# NumPy dtypes (requires optional numpy install)
 df = ppd.DataFrame(data, dtype={"a": np.int64, "b": np.float64})

# pandas dtypes
# df = ppd.DataFrame(data, dtype={"a": pd.Int64Dtype(), "b": pd.StringDtype()})

# Polars schema dict
 df = ppd.DataFrame(data, dtype={"a": pl.Int64, "b": pl.Utf8})

I/O functions:

# Eager
 df = ppd.read_csv("data.csv", dtype={"id": "int64", "name": "string"})
 df = ppd.read_json("data.json", schema={"value": "float64"})
 df = ppd.read_parquet("data.parquet", dtype={"id": "uint32"})  # casts after read
 df = ppd.read_feather("data.feather", schema={"flag": "bool"})  # casts after read

# Lazy (scan)
 lf = ppd.scan_csv("data.csv", schema={"id": "int64"})
 lf = ppd.scan_parquet("data.parquet", dtype={"score": "float32"})  # lazy cast
 lf = ppd.scan_json("data.json", dtype={"name": "string"})

Notes:

  • When both dtype and schema are provided, schema takes precedence.
  • Parquet/Feather do not accept a schema parameter at read time in Polars; types are cast after reading (or lazily for scans).

๐Ÿ“ˆ Series Operations

# String operations
df["name"].str.upper()
df["email"].str.contains("@")
df["text"].str.split(" ")

# Datetime operations
df["date"].dt.year
df["timestamp"].dt.floor("D")
df["datetime"].dt.strftime("%Y-%m-%d")

# Statistical methods
df["values"].rank()
df["scores"].nlargest(5)
df["prices"].clip(lower=0, upper=100)

๐ŸŽฏ Advanced Indexing โšก

All indexing operations now use native Polars implementations for maximum performance - no pandas conversion overhead!

# Label-based indexing (with index set)
df = ppd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "city": ["NYC", "LA", "Chicago"]
}, index=["a", "b", "c"])

# Select rows by label
df.loc["a"]  # Single row (returns Series)
df.loc[["a", "b"], ["name", "age"]]  # Multiple rows and columns
# Output:
# shape: (2, 2)
# โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”
# โ”‚ name  โ”† age โ”‚
# โ”‚ ---   โ”† --- โ”‚
# โ”‚ str   โ”† i64 โ”‚
# โ•žโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•ก
# โ”‚ Alice โ”† 25  โ”‚
# โ”‚ Bob   โ”† 30  โ”‚
# โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”˜

# Position-based indexing
df.iloc[0:2, 1:3]  # Slice rows and columns
# Output:
# shape: (2, 2)
# โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
# โ”‚ age โ”† city    โ”‚
# โ”‚ --- โ”† ---     โ”‚
# โ”‚ i64 โ”† str     โ”‚
# โ•žโ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
# โ”‚ 25  โ”† NYC     โ”‚
# โ”‚ 30  โ”† LA      โ”‚
# โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

df.iloc[[0, 2], :]  # Select specific rows, all columns
# Output:
# shape: (2, 3)
# โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
# โ”‚ name    โ”† age โ”† city    โ”‚
# โ”‚ ---     โ”† --- โ”† ---     โ”‚
# โ”‚ str     โ”† i64 โ”† str     โ”‚
# โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
# โ”‚ Alice   โ”† 25  โ”† NYC     โ”‚
# โ”‚ Charlie โ”† 35  โ”† Chicago  โ”‚
# โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

# Assignment (now using native Polars - 270x faster for boolean masks!)
df.loc["a", "age"] = 26
df.iloc[0, 0] = "Alice Updated"
df.loc[df["age"] > 25, "age"] = 30  # Boolean mask assignment - optimized!

๐Ÿ—๏ธ Architecture

PolarPandas uses a wrapper pattern that provides:

  • Mutable operations with inplace parameter
  • Index preservation across operations
  • Pandas-compatible API with Polars performance
  • Type safety with comprehensive type hints
  • Error handling that matches pandas behavior
# Internal structure
class DataFrame:
    def __init__(self, data):
        self._df = pl.DataFrame(data)  # Polars backend
        self._index = None              # Pandas-style index
        self._index_name = None         # Index metadata

๐Ÿ“Š Performance Benchmarks

Run benchmarks yourself:

python benchmark_large.py

Large Dataset Performance (1M rows)

Operation pandas PolarPandas Speedup
DataFrame Creation 224.89 ms 15.95 ms โšก 14.1x
Read CSV 8.00 ms 0.88 ms โšก 9.1x
Sorting 28.05 ms 3.97 ms โšก 7.1x
GroupBy 7.95 ms 2.44 ms โšก 3.3x
Filtering 1.26 ms 0.42 ms โšก 3.0x

Memory Efficiency

  • 50% less memory usage than pandas
  • โšก Lazy evaluation for complex operations (LazyFrame)
  • Optimized data types with Polars backend
  • Query optimization with lazy execution plans

๐Ÿงช Testing & Quality

โœ… Comprehensive Testing

  • 498 tests passing (100% success rate)
  • 54 tests properly skipped (documented limitations)
  • 72% code coverage across all functionality
  • Edge case handling for empty DataFrames, null values, mixed types
  • Comprehensive error handling with proper exception conversion
  • Parallel test execution - Fast test runs with pytest-xdist

โœ… Code Quality

  • Zero linting errors with ruff compliance
  • 100% type safety - all ty type errors resolved
  • Fully formatted code with ruff formatter
  • Clean code standards throughout
  • Production-ready code quality

โœ… Type Safety

# Full type hints support
def process_data(df: ppd.DataFrame) -> ppd.DataFrame:
    return df.groupby("category").agg({"value": "mean"})

# IDE support with autocompletion
df.loc[df["age"] > 25, "name"]  # Type-safe operations

๐Ÿ”ง Development

Running Tests

# All tests
pytest tests/ -v

# With coverage
pytest tests/ --cov=src/polarpandas --cov-report=html

# Specific test file
pytest tests/test_dataframe_core.py -v

# SQL enhanced suite (requires SQLAlchemy extra)
pip install -e '.[test,sqlalchemy]'
pytest -m requires_sqlalchemy tests/test_sql_enhanced.py -v

Code Quality

# Format code
ruff format .

# Check linting
ruff check .

# Type checking
ty check src/polarpandas/

Current Status:

  • โœ… All tests passing (498 passed, 54 skipped)
  • โœ… Zero linting errors (ruff check)
  • โœ… Code fully formatted (ruff format)
  • โœ… Type checked (ty compliance)
  • โœ… Parallel test execution supported

Benchmarks

# Basic benchmarks
python benchmark.py

# Large dataset benchmarks
python benchmark_large.py

# Detailed analysis
python benchmark_detailed.py

๐Ÿ“‹ Known Limitations

PolarPandas achieves 100% compatibility for implemented features. Remaining limitations are due to fundamental Polars architecture differences:

๐Ÿ”„ Permanent Limitations

  • Correlation/Covariance: Polars doesn't have built-in corr()/cov() methods
  • Transpose with mixed types: Polars handles mixed types differently than pandas
  • MultiIndex support: Polars doesn't have native MultiIndex support
  • JSON orient formats: Some pandas JSON orient formats not supported by Polars

๐Ÿ” Temporary Limitations

  • Advanced indexing: Some complex pandas indexing patterns not yet implemented
  • Complex statistical methods: Some advanced statistical operations need implementation

Total: 54 tests properly skipped with clear documentation

๐Ÿค Contributing

We welcome contributions! Here's how to get started:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes and add tests
  4. Run the test suite: pytest tests/ -v
  5. Check code quality: ruff check src/polarpandas/
  6. Submit a pull request

Development Setup

git clone https://github.com/eddiethedean/polarpandas.git
cd polarpandas
pip install -e ".[dev,test]"

๐Ÿ’ก Running optional SQL tests? Install the SQLAlchemy extra (pip install -e ".[sqlalchemy]" or rely on the dev/test extras above) and execute pytest -m requires_sqlalchemy to include the SQL enhanced suite. Without the extra, those tests are automatically skipped.

๐Ÿ“š Documentation

๐Ÿ† Why Choose PolarPandas?

Feature pandas Polars PolarPandas
Performance โญโญ โญโญโญโญโญ โญโญโญโญโญ
Memory Usage โญโญ โญโญโญโญโญ โญโญโญโญโญ
API Familiarity โญโญโญโญโญ โญโญ โญโญโญโญโญ
Ecosystem โญโญโญโญโญ โญโญโญ โญโญโญโญ
Type Safety โญโญ โญโญโญโญ โญโญโญโญ

๐ŸŽฏ Best of both worlds: pandas API + Polars performance

๐Ÿ“ˆ Roadmap

v0.6.0 (Current)

  • โœ… 619 pandas-compatible features - Comprehensive API coverage
  • โœ… Complete Index methods - All 73 Index methods implemented
  • โœ… Full String accessor - All 57 .str methods implemented
  • โœ… Complete Datetime accessor - All 28 .dt methods implemented
  • โœ… 91 LazyFrame methods - Complete LazyFrame API with pandas DataFrame comparison (262 total methods tracked)
  • โœ… Enhanced I/O support - Multiple file formats with optional dependencies
  • โœ… Type checking with ty - Modern, fast type checker integration
  • โœ… API compatibility matrix - Comprehensive tracking of pandas compatibility

v0.4.0

  • โœ… Native Polars Indexing - Replaced all pandas fallbacks with native Polars implementations
  • โœ… Boolean Mask Optimization - 270x performance improvement for boolean mask assignment
  • โœ… Optional Pandas - Pandas is now truly optional, only required for specific conversion features
  • โœ… Enhanced Error Handling - Typo suggestions in error messages
  • โœ… Code Refactoring - Centralized index management and exception utilities
  • โœ… Type Safety - Improved type checking and resolved critical type issues

v0.3.1

  • โœ… Fixed GitHub Actions workflow dependencies (pytest, pandas, numpy, pyarrow)
  • โœ… Fixed Windows file handling issues in I/O tests (28 tests now passing)
  • โœ… All platforms (Ubuntu, macOS, Windows) now passing all 457 tests

v0.3.0

  • โœ… Comprehensive Documentation - Professional docstrings for all public APIs
  • โœ… LazyFrame Class - Optional lazy execution for maximum performance
  • โœ… Lazy I/O Operations - scan_csv(), scan_parquet(), scan_json()
  • โœ… Eager DataFrame - Default pandas-like behavior
  • โœ… Seamless Conversion - df.lazy() and lf.collect() methods
  • โœ… 100% Type Safety - All ty errors resolved
  • โœ… Comprehensive Testing - 457 tests covering all functionality
  • โœ… Code Quality - Zero linting errors, fully formatted code

v0.7.0 (Planned)

  • Advanced MultiIndex support
  • More statistical methods
  • Enhanced I/O formats (additional formats)
  • Further performance optimizations
  • Additional LazyFrame method implementations

Future

  • Machine learning integration
  • Advanced visualization support
  • Distributed computing support
  • GPU acceleration

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Polars - The blazing-fast DataFrame library
  • pandas - The inspiration and API reference
  • Contributors - Everyone who helps make PolarPandas better

Made with โค๏ธ for the data science community

โญ Star us on GitHub โ€ข ๐Ÿ› Report Issues โ€ข ๐Ÿ’ฌ Discussions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polarpandas-0.9.0.tar.gz (236.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polarpandas-0.9.0-py3-none-any.whl (167.0 kB view details)

Uploaded Python 3

File details

Details for the file polarpandas-0.9.0.tar.gz.

File metadata

  • Download URL: polarpandas-0.9.0.tar.gz
  • Upload date:
  • Size: 236.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.19

File hashes

Hashes for polarpandas-0.9.0.tar.gz
Algorithm Hash digest
SHA256 9b28feef84522ca6cbd8cb49af9833bda8bdfa96a7090d31043f38301b31eaf1
MD5 237a0ecda13db6ac9bf71fa8804d4693
BLAKE2b-256 de046dd68c3f19007afeb37304b8cfc4cc7e667e57b6dfcabceea4d6fc2a73f4

See more details on using hashes here.

File details

Details for the file polarpandas-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: polarpandas-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 167.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.19

File hashes

Hashes for polarpandas-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eab616eba1bd708ec9598d7342958f1cdc6944a19b11ac05312a420c9debc582
MD5 e5044a342677d644cb9fe4dc89a9b3ea
BLAKE2b-256 1d140936a776ac6b0be4217b3b6f2a83c749372d38a6b9c02923db028680a64a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page