Skip to main content

A flexible query engine for pandas DataFrames with SQL, regex, date, and fuzzy matching.

Project description

Querexfuzz

A flexible query engine for pandas DataFrames. querexfuzz lets you filter and search your data using a unified syntax that combines SQL-like where clauses, regular expressions, natural date ranges, and fuzzy matching — all in a single query string.


Core Features

  • Unified query language: combine where, regex (~ / !), date filters (@), and fuzzy matching (#) in one string.
  • DataFrame native: attaches a .querex() method (and .q() alias) directly to DataFrame instances.
  • Auto-configuration: querexfuzz_from_df inspects column types and sets sensible defaults automatically.
  • Fast fuzzy search: powered by skimmatch; matcher built once per DataFrame and cached for the lifetime of the engine.
  • Configurable: via YAML file, keyword arguments, or both.

Installation

pip install querexfuzz

For development:

git clone https://github.com/mynl/querexfuzz.git
cd querexfuzz
pip install -e .[test]

Quickstart

Auto-configure from a DataFrame

The simplest path — querexfuzz_from_df inspects column types and wires everything up:

import pandas as pd
from querexfuzz import querexfuzz_from_df

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'age': [25, 30, 35, 40, 45],
    'city': ['Amsterdam', 'Berlin', 'Copenhagen', 'Berlin', 'Amsterdam'],
    'registered_date': pd.to_datetime([
        '2025-08-10', '2025-06-15', '2024-01-20', '2025-08-25', '2025-07-30'
    ])
})

querexfuzz_from_df(df)        # attaches .querex() and .q() to df in-place

result = df.querex("where city == 'Berlin'")
result = df.q("top 5 # ams")  # short alias

Explicit configuration

from querexfuzz import Querexfuzz

engine = Querexfuzz(
    base_cols=['name', 'city', 'registered_date', 'age'],
    date_fields=['registered_date'],
    default_date_field='registered_date',
    bang_field='name',
    recent_field='registered_date',
    fuzzy=dict(fields=['name', 'city'], limit=50, score_col_name='score'),
)
engine.attach_to(df)

result = df.querex("recent top 10 where age > 30 # berlin")

From a YAML config file

engine = Querexfuzz(config_path='config.yml')
engine = Querexfuzz(config_path='config.yml', fuzzy={'limit': 200})  # with overrides

Query Syntax

Clauses are order-sensitive and must appear in this sequence (all optional):

[verbose] [recent] [top N | bottom N] [select cols]
[field ~ regex | ! term] [where expr] [order by cols] [@ date_spec] [# fuzzy_term]

An empty query returns all base columns for all rows.

where — SQL-like filter

df.querex("where city == 'Amsterdam' and age > 30")

! / ~ — Regex

df.querex("! ^[AB]")            # regex on bang_field (default regex target)
df.querex("name ~ ^[AB]")       # regex on named column

@ — Date range

Units: c calendar year, y year, q quarter, m month, w week, d day, h hour.

df.querex("@m-3")                        # last 3 months (default date field)
df.querex("@registered_date m-28:6")     # 28 to 6 months ago on named field
df.querex("@y-1")                        # last year

# — Fuzzy matching

Must be the last clause. Results are sorted by score descending.

df.querex("# berlin")
df.querex("where age > 30 # ams")        # filter first, then fuzzy over full data

select — Column projection

Syntax Meaning
(default) base columns
select * base columns
select ** all columns
select a, b named columns
select *, a base columns plus a
select *, -a base columns minus a (- or ! prefix)
select **, -a all columns minus a

top / bottom / recent / order by

df.querex("top 10")
df.querex("bottom 5")
df.querex("recent")                       # sort by recent_field descending
df.querex("order by age")
df.querex("order by -age, name")          # - prefix = descending

Combining clauses

df.querex("top 5 recent where city == 'Berlin' @m-3 select name, age # bob")

Fuzzy matching and caching

The fuzzy matcher (skimmatch) is built once per attached DataFrame and cached on the engine. Repeat fuzzy queries against the same DataFrame pay only the cost of matcher.query().

When where, regex, or date pre-filters are present, the matcher still runs over the full DataFrame and results are intersected with the pre-filtered rows (5× over-fetch to compensate for the narrower valid set).

If the DataFrame's contents change between queries, re-attach with mutable=True:

engine.attach_to(df, mutable=True)   # rebuilds matcher on every fuzzy call

Versions

2.0.3 (current)

Fuzzy caching refactor. Matcher built once per attached DataFrame and cached by id(df) on the engine — repeat queries skip all data preparation. Multiple DataFrames per engine each get an independent cache entry. attach_to(mutable=True) opt-in for DataFrames whose contents change.

2.0.2

Performance pass on execute_query: lazy DataFrame copy (copy only when date-column type coercion is needed, after prior filters have already reduced the frame); initial fuzzy matcher caching.

2.0.1

Code review fixes: method renamed .querex() / .q(); importlib.metadata version; Pydantic config corrections; parser and test suite overhaul; tzlocal dependency added.

2.0.0

Major rewrite. Lark-based grammar parser, Pydantic configuration, skimmatch fuzzy backend (replacing rustfuzz), src/ layout.

0.1.0

Initial release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

querexfuzz-2.0.4.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

querexfuzz-2.0.4-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file querexfuzz-2.0.4.tar.gz.

File metadata

  • Download URL: querexfuzz-2.0.4.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for querexfuzz-2.0.4.tar.gz
Algorithm Hash digest
SHA256 4dbb5f5275fc7cbb7234fb0e6099dd7c5e325c2daf858aa300f9bb52e6118632
MD5 4103a90f4a279317342db1a4f80d51a1
BLAKE2b-256 2198a805fc20b717e3bc890762ae18ae7781ec320050ed7941de7b184797bb22

See more details on using hashes here.

File details

Details for the file querexfuzz-2.0.4-py3-none-any.whl.

File metadata

  • Download URL: querexfuzz-2.0.4-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for querexfuzz-2.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 30dfab9ceaf22e0fc82aa8cfddd6a7aa366904cfc08cc53e86940ec6b58b29db
MD5 17b3725d28c2850c83a82022ecb92b47
BLAKE2b-256 6b6c7400982b27d682b0493f28136fcc62668baf63e460a5e67857fc60a3ba13

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page