Skip to main content

Grep-like tool for dataframes using Narwhals

Project description

nwgrep

Grep your dataframes

Search and filter dataframes with grep-like patterns. Works with pandas, polars, and any backend supported by Narwhals.

Documentation uv ruff ty License: MIT Claude Gemini

At a Glance

# Find what you're looking for
df.grep("active")              # Simple search
df.grep("@gmail.com")          # Find patterns
df.grep(r"^\d{3}-\d{4}$")      # Regex support

Why nwgrep?

  • ๐Ÿ” Familiar - grep-like interface for row-based dataframe filtering
  • ๐Ÿš€ Fast - Backend-agnostic, works with your preferred library
  • ๐ŸŽฏ Simple - Three ways to use: function, pipe, or accessor
  • โšก Efficient - Lazy evaluation with polars/daft for large datasets

Quick Start

uv add nwgrep
from nwgrep import nwgrep
import polars as pl

df = pl.DataFrame({
    "name": ["Alice", "Bob", "Eve"],
    "status": ["active", "locked", "active"],
})

# Find all rows containing "active"
result = nwgrep(df, "active")

# โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
# โ”‚ name  โ”† status โ”‚
# โ”‚ ---   โ”† ---    โ”‚
# โ”‚ str   โ”† str    โ”‚
# โ•žโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ก
# โ”‚ Alice โ”† active โ”‚
# โ”‚ Eve   โ”† active โ”‚
# โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Three Ways to Use

Choose the style that fits your workflow:

1. Direct Function

from nwgrep import nwgrep
result = nwgrep(df, "active")

2. Pipe Method

result = (
    df
    .pipe(nwgrep, "active")
    .pipe(nwgrep, "@example.com", columns=["email"])
)

3. Accessor Method

For Polars and Pandas backends, you can use the accessor method to add .grep function directly to the DataFrame:

from nwgrep import register_grep_accessor
register_grep_accessor()

df.grep("active")                    # Search all columns
df.grep("ALICE", case_sensitive=False)  # Case-insensitive
df.grep("example.com", columns=["email"])  # Specific columns

Powerful Search Options

# Case-insensitive search
df.grep("ACTIVE", case_sensitive=False)

# Invert match (like grep -v)
df.grep("test", invert=True)

# Regex patterns
df.grep(r".*@example\.com", regex=True)

# Multiple patterns (OR logic)
df.grep(["Alice", "Bob"])

# Whole word matching
df.grep("active", whole_word=True)

# Column-specific search
df.grep("pattern", columns=["name", "email"])

# Highlight matching cells in notebooks (pandas/polars)
df.grep("error", highlight=True)  # Returns styled output with highlighted cells

Command Line Interface

Search parquet, feather, and other binary formats directly:

# Install cli
uv tool install "nwgrep[cli]"

# Basic search
nwgrep "error" logfile.parquet

# Case insensitive + regex
nwgrep -i -E "warn(ing)?" data.feather

# Column-specific search
nwgrep --columns email "@gmail.com" users.parquet

# Count matching rows
nwgrep --count "pattern" data.parquet

# List files with matches (like grep -l)
nwgrep -l "error" *.parquet

# Show only matching values (like grep -o)
nwgrep -o "error" data.parquet

# Stream as NDJSON (lazy evaluation)
nwgrep --format ndjson "pattern" huge_file.parquet

Backend Support

Works seamlessly with any dataframe library thanks to Narwhals:

Backend Support Notes
pandas โœ… Full support
polars โœ… DataFrame and LazyFrame
pyarrow โœ… Table support
dask โœ… Distributed dataframes
daft โœ… Lazy evaluation
cuDF โœ… GPU acceleration
modin โœ… Parallel pandas

Same code, any backend. Switch freely without rewriting your filters.

Installation

Basic installation:

uv add nwgrep
# or
pip install nwgrep

With specific backends:

uv add nwgrep             # core library
uv add nwgrep[cli]        # CLI for searching parquet/feather files using polars
uv add nwgrep[notebook]   # highlighting in notebooks (pandas/polars)
uv add nwgrep[all]        # include all features (cli + notebook)

Note: nwgrep is designed to be added to an existing environment with a dataframe library (pandas, polars, etc.) already installed. It does not install these backends by default, except for polars when installing the [cli] extra.

Features

  • ๐Ÿš€ Backend agnostic: Write once, run on any dataframe library
  • ๐Ÿ” Multiple search modes: Literal, regex, case-sensitive/insensitive
  • ๐Ÿ“Š Column filtering: Search all columns or specific ones
  • โšก Lazy evaluation: Efficient with large datasets (polars/daft)
  • ๐ŸŽฏ Familiar interface: grep-like flags and behavior (-i, -v, -E)
  • ๐Ÿ”ง Type safe: Full type hints with ty type checking
  • ๐ŸŽจ Flexible API: Function, pipe, or accessor - your choice
  • ๐Ÿ–ฅ๏ธ CLI included: Search binary formats from the command line

Documentation

Full documentation available at erichutchins.github.io/nwgrep

Quick Examples

Find Active Users

users = df.grep("active", columns=["status"])

Email Domain Search

gmail_users = df.grep("@gmail.com", columns=["email"])

Log Analysis

errors = df.grep(["ERROR", "CRITICAL"], columns=["level"])

Data Quality Checks

# Find rows without email addresses
missing_email = df.grep(r"\w+@\w+\.\w+", regex=True, invert=True)

Pipeline Filtering

result = (
    df
    .grep("active", columns=["status"])     # Active users
    .grep("@company.com", columns=["email"]) # Company emails
    .grep("admin", invert=True)              # Exclude admins
)

Narwhals Integration

nwgrep is a certified Narwhals plugin, enabling truly backend-agnostic code:

import narwhals as nw
from nwgrep import nwgrep

def process_any_dataframe(df_native):
    """Works with pandas, polars, pyarrow, or any Narwhals-supported backend"""
    df = nw.from_native(df_native)
    result = nwgrep(df, "pattern")
    return nw.to_native(result)

Contributing

Contributions welcome! See CONTRIBUTING.md for development setup and guidelines.

License

MIT License - see LICENSE file for details.


Built with Narwhals

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nwgrep-0.2.0.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nwgrep-0.2.0-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file nwgrep-0.2.0.tar.gz.

File metadata

  • Download URL: nwgrep-0.2.0.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for nwgrep-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f550b8be7bc9ee6913d87aedc45871cf0ad7773ca15d7feb005637c36b2a423e
MD5 9541158b8d0e745483e71c20fedd3002
BLAKE2b-256 394e71ff0ef624cbd8d5fa28ef221d9e93e8c937cf9d52afc80ec47eb456a564

See more details on using hashes here.

File details

Details for the file nwgrep-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: nwgrep-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for nwgrep-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e276932d3a3e3907ee8703452fdb855f503473c72279142cbf8ca1087e12b07
MD5 bbb531b4c44b8f595a09339e5ab734f8
BLAKE2b-256 6d83a844072c91e6b56d78c99b34090eb9e9e87a7cae91bb3f82ba29cecbaa6f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page