Python bindings for eregex, an advanced regular expression engine inspired by mrab-regex
Project description
eregex (Python bindings)
Python bindings for eregex — an
advanced regular expression engine for Rust inspired by mrab-regex (the Python
regex module).
This package exposes eregex's full API to Python via PyO3 and ships as a wheel built with maturin. All matching logic runs in compiled Rust; the Python layer is a thin adapter.
Features
- Named groups, duplicate group names, repeated captures
- Greedy / lazy / possessive quantifiers, atomic groups
(?>...) - Variable-length lookbehind / lookahead
- Inline scoped flags
(?i),(?i-m:...) - Backreferences
\1,\g<name>,(?P=name) - Partial / end-anchored matching (
find_partial) find,match_at_start(Pythonre.match),fullmatch(re.fullmatch)replace,replace_allwith$1/${name}/$$templatessplit,escape, and more
Installation
The wheel is built from the Rust core:
cd crates/eregex-python
python -m venv .venv
. .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install maturin
maturin develop --release # editable install into the current venv
# or: maturin build --release && pip install target/wheels/eregex-*.whl
maturin develop installs an import eregex module into the active virtual
environment (the extension is named eregex).
Quick start
import eregex
re = eregex.Regex(r"(\w+)\s+(\w+)")
m = re.find("hello world")
m.matched # 'hello world'
m.group(1) # 'hello'
m.group(2) # 'world'
m[1] # 'hello' (Match is sequence-like)
# Flags: pass a bitset of the module-level constants, or parse a string.
eregex.Regex("hello", eregex.IGNORECASE).is_match("HELLO") # True
eregex.Regex("hello", eregex.parse_flags("i")).is_match("HELLO") # True
# Repeated captures (signature mrab-regex feature).
eregex.Regex(r"(\w)+").find("abc").captures(1) # ['a', 'b', 'c']
# Replace with named groups.
eregex.Regex(r"(?P<a>\d)(?P<b>\d)").replace_all("12 34", "${b}${a}") # '21 43'
Regex
class Regex:
def __init__(self, pattern: str, flags: int = 0): ...
@property
def pattern(self) -> str: ...
@property
def flags(self) -> int: ... # resolved (UNICODE + VERSION1 added)
@property
def capture_count(self) -> int: ... # excluding group 0
def group_names(self) -> list[str]: ...
def group_index(self, name: str) -> int | None: ...
def is_match(self, haystack: str) -> bool: ...
def find(self, haystack: str) -> Match | None: ...
def find_at(self, haystack: str, start: int) -> Match | None: ...
def match_at_start(self, haystack: str) -> Match | None: ... # re.match
def fullmatch(self, haystack: str) -> Match | None: ...
def findall(self, haystack: str) -> list[Match]: ...
def find_partial(self, haystack: str) -> PartialMatch | None: ...
def replace(self, haystack: str, repl: str) -> str: ...
def replace_all(self, haystack: str, repl: str) -> str: ...
def split(self, haystack: str) -> list[str]: ...
def dump(self) -> str: ... # AST debug aid
flags is a bitwise OR of the module-level constants: IGNORECASE,
MULTILINE, DOTALL, UNICODE, ASCII, VERBOSE, FULLCASE, WORD,
LOCALE, VERSION0, VERSION1. parse_flags("ims") parses a flag string
for re-familiar ergonomics.
Match
Match is sequence-like: len(m) is the number of groups (group 0 first),
and m[i] / m["name"] look up by index / name.
class Match:
@property
def matched(self) -> str # whole match (group 0)
@property
def group0(self) -> str # alias of matched
@property
def input(self) -> str
@property
def start(self) -> int # byte offset
@property
def end(self) -> int
@property
def span(self) -> tuple[int, int]
@property
def capture_count(self) -> int
@property
def groups(self) -> list[str | None]
@property
def named_groups(self) -> dict[str, str]
@property
def all_captures(self) -> list[list[str | None]]
@property
def captures_dict(self) -> dict[str, list[str | None]]
def group(self, *indices_or_names) -> ... # re.match.group semantics
def captures(self, index: int) -> list[str | None]
def captures_by_name(self, name: str) -> list[str | None]
def span_of(self, index: int = 0) -> tuple[int, int] | None
def start_of(self, index: int = 0) -> int
def end_of(self, index: int = 0) -> int
All offsets are byte offsets (UTF-8), matching Python's re and the Rust
core. None is returned for groups that did not participate.
Partial matching
find_partial is an end-anchored search: it asks whether the haystack,
taken up to its end, could be the start of a full match. Use it when
validating input as the user types, parsing an incomplete stream, or asking
"could more input turn this into a match?"
It returns one of three outcomes:
| result | meaning |
|---|---|
PartialMatch (partial) |
a valid prefix so far — more input could complete it |
PartialMatch (full) |
the input already fully matches (and consumes it to its end) |
None |
a hard mismatch: no possible continuation could match |
Each capturing group in a partial match is itself in one of three states,
reported by group_state(i): "matched" (fully matched), "partial" (entered
but not yet completed), or "none" (never participated — group(i) is None).
class PartialMatch:
@property
def status(self) -> str # "full" | "partial"
@property
def is_full(self) -> bool
@property
def is_partial(self) -> bool
@property
def matched(self) -> str
@property
def start(self) -> int # byte offset where the match starts
@property
def end(self) -> int # byte offset of the input end (always len(haystack))
@property
def capture_count(self) -> int
def group(self, index: int = 0) -> str | None
def named_group(self, name: str) -> str | None
def group_state(self, index: int = 0) -> str # "matched" | "partial" | "none"
Incremental typing graduates partial → full → None:
re = eregex.Regex(r"abc")
re.find_partial("") # None (nothing started yet)
re.find_partial("a") # partial .status == "partial"
re.find_partial("ab") # partial
re.find_partial("abc") # full .is_full
re.find_partial("abcd")# None ('d' rules out any continuation)
Group states as a match fills in. With token=([a-z]+)([0-9]+)([A-Z]+):
re = eregex.Regex(r"token=([a-z]+)([0-9]+)([A-Z]+)")
p = re.find_partial("x token=abc")
p.is_partial # True
p.matched # 'token=abc'
p.start # 2 (byte offset of the match)
p.end # 11 (end of input — always, since end-anchored)
p.capture_count # 3
p.group(1) # 'abc' p.group_state(1) # 'matched'
p.group(2) # '' p.group_state(2) # 'partial' (entered, empty so far)
p.group(3) # None p.group_state(3) # 'none' (never entered)
re.find_partial("token=abc123XYZ") # group 3 -> 'matched', status 'full'
re.find_partial("x token=abc!") # None ('!' rules out any continuation)
Named groups work the same way:
re = eregex.Regex(r"token=(?P<word>[a-z]+)(?P<num>[0-9]+)")
p = re.find_partial("token=ab")
p.named_group("word") # 'ab' (matched)
p.named_group("num") # '' (partial — empty so far)
Module-level helpers
escape(s: str) -> str
escape_special_only(s: str) -> str
escape_literal_spaces(s: str) -> str
is_match(pattern: str, haystack: str) -> bool # compiles pattern once
compile(pattern: str, flags: int = 0) -> Regex
parse_flags(flag_str: str) -> int
Testing
. .venv/bin/activate
maturin develop --release
python -m unittest test_eregex -v
Layout
This is one half of eregex's binding story. The same Rust core (eregex)
also ships Node.js bindings via napi-rs. See the project root for the core
crate and its feature matrix.
License
Apache-2.0, matching the upstream mrab-regex project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eregex-0.1.5-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: eregex-0.1.5-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 230.3 kB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b68461914abdceea40f97c5e55b235236710348266693bb4fe49561611f77ed
|
|
| MD5 |
01d9f88bbc00a7c48c7e7d465a8ace6a
|
|
| BLAKE2b-256 |
eb071e4eaa7988aefb20253db67b4196372f622c134e8d774c00c1aaeb175d88
|
Provenance
The following attestation bundles were made for eregex-0.1.5-cp39-abi3-win_amd64.whl:
Publisher:
release.yml on a5i/eregex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eregex-0.1.5-cp39-abi3-win_amd64.whl -
Subject digest:
0b68461914abdceea40f97c5e55b235236710348266693bb4fe49561611f77ed - Sigstore transparency entry: 1903500557
- Sigstore integration time:
-
Permalink:
a5i/eregex@79b0b3789998a90efc7fff9a0abde33a9b358176 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/a5i
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@79b0b3789998a90efc7fff9a0abde33a9b358176 -
Trigger Event:
push
-
Statement type:
File details
Details for the file eregex-0.1.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: eregex-0.1.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 350.9 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78038f459a846dcb48bb0bb10c41587fc2f11b01dd6513dbe0ebc6220c523a92
|
|
| MD5 |
7846cc93883569c32ccc3b8ec0341587
|
|
| BLAKE2b-256 |
c0ee0e92e0c5d351fc6b103d46a90eead81d8e4cb73743e8e0bf26e6c52f4644
|
Provenance
The following attestation bundles were made for eregex-0.1.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on a5i/eregex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eregex-0.1.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
78038f459a846dcb48bb0bb10c41587fc2f11b01dd6513dbe0ebc6220c523a92 - Sigstore transparency entry: 1903500450
- Sigstore integration time:
-
Permalink:
a5i/eregex@79b0b3789998a90efc7fff9a0abde33a9b358176 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/a5i
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@79b0b3789998a90efc7fff9a0abde33a9b358176 -
Trigger Event:
push
-
Statement type:
File details
Details for the file eregex-0.1.5-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: eregex-0.1.5-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 323.9 kB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b9c8368daac8052bf64f271a5f2cdab556549273d816109dd66870a8909889f
|
|
| MD5 |
abd0cc68675b2e81104079933f02480e
|
|
| BLAKE2b-256 |
adc3fd11573cefabd0419ede5d9f400b8a2d6eb7467a2d5116d5030942d84a8a
|
Provenance
The following attestation bundles were made for eregex-0.1.5-cp39-abi3-macosx_11_0_arm64.whl:
Publisher:
release.yml on a5i/eregex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eregex-0.1.5-cp39-abi3-macosx_11_0_arm64.whl -
Subject digest:
4b9c8368daac8052bf64f271a5f2cdab556549273d816109dd66870a8909889f - Sigstore transparency entry: 1903500337
- Sigstore integration time:
-
Permalink:
a5i/eregex@79b0b3789998a90efc7fff9a0abde33a9b358176 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/a5i
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@79b0b3789998a90efc7fff9a0abde33a9b358176 -
Trigger Event:
push
-
Statement type: