Grep-like tool for dataframes using Narwhals
Project description
nwgrep
Grep your dataframes
Search and filter dataframes with grep-like patterns. Works with pandas, polars, and any backend supported by Narwhals.
At a Glance
# Find what you're looking for
df.grep("active") # Simple search
df.grep("@gmail.com") # Find patterns
df.grep(r"^\d{3}-\d{4}$") # Regex support
Why nwgrep?
- ๐ Familiar - grep-like interface for row-based dataframe filtering
- ๐ Fast - Backend-agnostic, works with your preferred library
- ๐ฏ Simple - Three ways to use: function, pipe, or accessor
- โก Efficient - Lazy evaluation with polars/daft for large datasets
Quick Start
uv add nwgrep
from nwgrep import nwgrep
import polars as pl
df = pl.DataFrame({
"name": ["Alice", "Bob", "Eve"],
"status": ["active", "locked", "active"],
})
# Find all rows containing "active"
result = nwgrep(df, "active")
# โโโโโโโโโฌโโโโโโโโโ
# โ name โ status โ
# โ --- โ --- โ
# โ str โ str โ
# โโโโโโโโโชโโโโโโโโโก
# โ Alice โ active โ
# โ Eve โ active โ
# โโโโโโโโโดโโโโโโโโโ
Three Ways to Use
Choose the style that fits your workflow:
1. Direct Function
from nwgrep import nwgrep
result = nwgrep(df, "active")
2. Pipe Method
result = (
df
.pipe(nwgrep, "active")
.pipe(nwgrep, "@example.com", columns=["email"])
)
3. Accessor Method
For Polars and Pandas backends, you can use the accessor method to add .grep function directly to the DataFrame:
from nwgrep import register_grep_accessor
register_grep_accessor()
df.grep("active") # Search all columns
df.grep("ALICE", case_sensitive=False) # Case-insensitive
df.grep("example.com", columns=["email"]) # Specific columns
Powerful Search Options
# Case-insensitive search
df.grep("ACTIVE", case_sensitive=False)
# Invert match (like grep -v)
df.grep("test", invert=True)
# Regex patterns
df.grep(r".*@example\.com", regex=True)
# Multiple patterns (OR logic)
df.grep(["Alice", "Bob"])
# Whole word matching
df.grep("active", whole_word=True)
# Column-specific search
df.grep("pattern", columns=["name", "email"])
# Highlight matching cells in notebooks (pandas/polars)
df.grep("error", highlight=True) # Returns styled output with highlighted cells
Command Line Interface
Search parquet, feather, and other binary formats directly:
# Install cli
uv tool install "nwgrep[cli]"
# Basic search
nwgrep "error" logfile.parquet
# Case insensitive + regex
nwgrep -i -E "warn(ing)?" data.feather
# Column-specific search
nwgrep --columns email "@gmail.com" users.parquet
# Count matching rows
nwgrep --count "pattern" data.parquet
# List files with matches (like grep -l)
nwgrep -l "error" *.parquet
# Show only matching values (like grep -o)
nwgrep -o "error" data.parquet
# Stream as NDJSON (lazy evaluation)
nwgrep --format ndjson "pattern" huge_file.parquet
Backend Support
Works seamlessly with any dataframe library thanks to Narwhals:
| Backend | Support | Notes |
|---|---|---|
| pandas | โ | Full support |
| polars | โ | DataFrame and LazyFrame |
| pyarrow | โ | Table support |
| dask | โ | Distributed dataframes |
| daft | โ | Lazy evaluation |
| cuDF | โ | GPU acceleration |
| modin | โ | Parallel pandas |
Same code, any backend. Switch freely without rewriting your filters.
Installation
Basic installation:
uv add nwgrep
# or
pip install nwgrep
With specific backends:
uv add nwgrep # core library
uv add nwgrep[cli] # CLI for searching parquet/feather files using polars
uv add nwgrep[notebook] # highlighting in notebooks (pandas/polars)
uv add nwgrep[all] # include all features (cli + notebook)
Note: nwgrep is designed to be added to an existing environment with a dataframe library (pandas, polars, etc.) already installed. It does not install these backends by default, except for polars when installing the [cli] extra.
Features
- ๐ Backend agnostic: Write once, run on any dataframe library
- ๐ Multiple search modes: Literal, regex, case-sensitive/insensitive
- ๐ Column filtering: Search all columns or specific ones
- โก Lazy evaluation: Efficient with large datasets (polars/daft)
- ๐ฏ Familiar interface: grep-like flags and behavior (
-i,-v,-E) - ๐ง Type safe: Full type hints with ty type checking
- ๐จ Flexible API: Function, pipe, or accessor - your choice
- ๐ฅ๏ธ CLI included: Search binary formats from the command line
Documentation
Full documentation available at erichutchins.github.io/nwgrep
- Installation Guide - Setup for all backends
- Usage Examples - Comprehensive examples
- API Reference - Complete function reference
- CLI Reference - Command-line usage
Quick Examples
Find Active Users
users = df.grep("active", columns=["status"])
Email Domain Search
gmail_users = df.grep("@gmail.com", columns=["email"])
Log Analysis
errors = df.grep(["ERROR", "CRITICAL"], columns=["level"])
Data Quality Checks
# Find rows without email addresses
missing_email = df.grep(r"\w+@\w+\.\w+", regex=True, invert=True)
Pipeline Filtering
result = (
df
.grep("active", columns=["status"]) # Active users
.grep("@company.com", columns=["email"]) # Company emails
.grep("admin", invert=True) # Exclude admins
)
Narwhals Integration
nwgrep is a certified Narwhals plugin, enabling truly backend-agnostic code:
import narwhals as nw
from nwgrep import nwgrep
def process_any_dataframe(df_native):
"""Works with pandas, polars, pyarrow, or any Narwhals-supported backend"""
df = nw.from_native(df_native)
result = nwgrep(df, "pattern")
return nw.to_native(result)
Contributing
Contributions welcome! See CONTRIBUTING.md for development setup and guidelines.
License
MIT License - see LICENSE file for details.
Built with Narwhals
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nwgrep-0.2.0.tar.gz.
File metadata
- Download URL: nwgrep-0.2.0.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f550b8be7bc9ee6913d87aedc45871cf0ad7773ca15d7feb005637c36b2a423e
|
|
| MD5 |
9541158b8d0e745483e71c20fedd3002
|
|
| BLAKE2b-256 |
394e71ff0ef624cbd8d5fa28ef221d9e93e8c937cf9d52afc80ec47eb456a564
|
File details
Details for the file nwgrep-0.2.0-py3-none-any.whl.
File metadata
- Download URL: nwgrep-0.2.0-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e276932d3a3e3907ee8703452fdb855f503473c72279142cbf8ca1087e12b07
|
|
| MD5 |
bbb531b4c44b8f595a09339e5ab734f8
|
|
| BLAKE2b-256 |
6d83a844072c91e6b56d78c99b34090eb9e9e87a7cae91bb3f82ba29cecbaa6f
|