PARSELY-DIP: Deterministic Intent Parser — RegEx and NLP pipeline for intent recognition
Project description
PARSELY-DIP
Parsing And RegEx Syntactic Engine with Linguistic Yield — Deterministic Intent Parser
Parsely dip for silicon chips.
A deterministic intent recognition engine that processes natural language through a cascading pipeline — RegEx first, then constituency and dependency parsing via Stanza, then LLM fallback. Each layer only fires if the one above didn't match. The cheapest, fastest layer runs first. The LLM is the last resort, not the default.
Your LLM is expensive, slow, and unpredictable. When a user says "what time is it" or "move the card to done," there is zero ambiguity. A regex handles it in microseconds. An LLM spends tokens guessing what you already know. PARSELY-DIP intercepts deterministic commands before they reach the LLM, executes them directly, and returns the result.
What It Does
from parsely_dip import parse
result = parse("what time is it")
# result = "14:32"
result = parse("what is the weather like")
# result = "It's 36°F and broken clouds in Cleveland."
result = parse("tell me about quantum physics")
# result = None (no match — pass to LLM)
One call. One input. Response string or None.
Install
pip install parsely-dip
From source:
git clone https://github.com/gbutiri/parsely-dip.git
cd parsely-dip
pip install -e .
NLP Layer Setup (Optional)
The RegEx layer works out of the box. The NLP layer requires Stanza and a running parse service.
1. Download the Stanza English model (~526MB):
python -c "import stanza; stanza.download('en')"
2. (Recommended) Download the accurate model with transformer support:
python -c "import stanza; stanza.download('en', package='default_accurate')"
pip install transformers sentencepiece
The default_accurate model uses PEFT fine-tuned transformers (Google Electra Large). The biggest accuracy improvement is in constituency parsing — the core of NLP intent matching. Requires ~1-2GB extra VRAM on a dedicated GPU.
3. (Recommended) Install PyTorch with GPU support:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130
Without this, Stanza runs on CPU. With a dedicated GPU (RTX 3060+), parsing is near-instant.
4. Start the NLP service:
python -m parsely_dip.engine.stanza_service
The service loads once and stays running. PARSELY-DIP calls it via HTTP on port 5013 for each query that passes the RegEx layer. The service auto-detects the best available model (default_accurate > default) and reports GPU status on startup.
Three-Tier Pipeline
User Input
|
v
[RegEx Layer] — Pattern matching, microseconds, zero dependencies
| match? --> handler executes, returns response
| no match? --> continue
v
[NLP Layer] — Stanza constituency + dependency parsing via HTTP service
| match? --> handler executes, returns response
| no match? --> continue
v
[LLM Fallback] — parse() returns None, caller decides what to do
Layer 1: RegEx
Patterns stored in flat .patterns text files. One pattern per line. No JSON escaping nightmares.
# Format: (regex) => intent_name
# intents/time.py
(what('s|\s+is)\s+the\s+time|what\s+time\s+is\s+it)\?? => tell_time
# intents/weather.py
((what|how)('s|\s+is)\s+the\s+weather(\s+like)?)\?? => tell_weather
# intents/scrum.py
show(\s+me|\s+us)?\s+the\s+(current|active)(\s+scrum)?\s+cards?[.!]? => show_current_card
Pattern convention: \s+ goes BEFORE the word it separates, not after.
CORRECT: (what('s|\s+is)\s+the\s+time)
WRONG: (what('s|is\s+)the\s+time\s+)
The space belongs to the approach of the next word, not trailing from the previous.
Layer 2: NLP
Patterns stored in .json files. Each pattern defines a grammatical structure using sentence type, POS tags, dependency relations, and head words. Matches on linguistic features, not exact strings — so "what time is it, please?" and "hey, what's the time right now?" both match without needing separate regex patterns.
[
{
"intent": "tell_time",
"nlp": {
"sentence_type": ["SBARQ", "SQ", "WHNP"],
"words": [
{"word": "what", "pos": "DET", "dep": "det", "required": true},
{"lemma": "time", "pos": "NOUN", "required": true},
{"lemma": "be", "pos": "AUX", "dep": "cop", "required": true},
{"word": "it", "pos": "PRON", "dep": "nsubj", "required": true}
]
}
}
]
The NLP layer requires the Stanza service running on port 5013. If the service is not running, the NLP layer is silently skipped and the pipeline falls through to LLM.
Layer 3: LLM Fallback
parse() returns None. The caller decides what to do — send to an LLM, show an error, or ignore. PARSELY-DIP does not call any LLM itself.
Intent Handlers
Self-registering via the @intent decorator. Import the module, the decorator registers the handler. No config files, no setup step.
from parsely_dip.engine.registry import intent
@intent('tell_time')
def tell_time():
from datetime import datetime
now = datetime.now()
return f"{now.hour:02d}:{now.minute:02d}"
Built-in Intents
| Intent | File | What It Does |
|---|---|---|
tell_time |
intents/time.py |
Returns current time in 24-hour format |
tell_weather |
intents/weather.py |
Returns weather via OpenWeatherMap API (requires WEATHER_API_KEY in .env) |
show_current_card |
intents/scrum.py |
Shows active scrum cards from SQLite database |
read_current_card |
intents/scrum.py |
Same data as show, but intended for LLM to summarize |
Adding New Intents
- Create a new file in
intents/(e.g.,intents/greeting.py) - Write a handler function with the
@intentdecorator - Add regex patterns to
patterns/base.patterns - (Optional) Add NLP patterns to
patterns/base_nlp.json - Import the module in
__init__.py
Project Structure
parsely-dip/
pyproject.toml — Package config, dependencies
README.md — This file
env_parselydip/ — Virtual environment
db/ — Database files (if needed by intents)
logs/ — Log files
tests/ — Test suite
src/parsely_dip/
__init__.py — parse(prompt) single entry point
engine/
registry.py — @intent decorator, handler registry, dispatch()
regex.py — load_patterns(), check_regex()
nlp.py — load_nlp_patterns(), check_nlp(), match_nlp_pattern()
splitter.py — Sentence splitting (future expansion)
stanza_service.py — Stanza NLP Flask service (port 5013)
intents/
__init__.py — Auto-imports all intent modules
time.py — tell_time handler
weather.py — tell_weather handler (OpenWeatherMap API)
scrum.py — show_current_card, read_current_card handlers
patterns/
base.patterns — RegEx patterns (flat text, one per line)
base_nlp.json — NLP patterns (structured JSON)
cli/
__init__.py — CLI entry point (future)
Hook Integration
PARSELY-DIP is designed to run as a Claude Code UserPromptSubmit hook. The hook intercepts the user's message, runs it through the pipeline, and either handles it deterministically or lets the LLM process it.
Hook Script
#!/bin/bash
PROJECT_DIR="${CLAUDE_PROJECT_DIR:-.}"
VENV_PY="$PROJECT_DIR/env_bibliotech/Scripts/python.exe"
[ ! -f "$VENV_PY" ] && exit 0
"$VENV_PY" -c "
import sys, json
from parsely_dip import parse
data = json.load(sys.stdin)
prompt = data.get('prompt', '')
if prompt:
r = parse(prompt)
if r:
print('=== PARSELY-DIP ===')
print('Relay this to the user EXACTLY as written, nothing else:')
print(r)
print('=== END PARSELY-DIP ===')
" 2>/dev/null
exit 0
How It Works
- Hook reads the user's prompt from stdin (JSON with
promptfield) - Calls
parsely_dip.parse(prompt) - If result: prints it to stdout (shown to LLM as context, LLM relays verbatim)
- If None: no output, LLM processes the prompt normally
Known Limitation
Claude Code's UserPromptSubmit hooks cannot display text directly to the user without the LLM firing. The documented decision: "block" + reason field blocks the prompt but does not render the reason in the VS Code extension (confirmed bug). The current approach uses plain text stdout with exit 0 — the LLM sees the result and relays it.
Stanza NLP Service
The NLP service is a Flask app that wraps Stanford's Stanza NLP library. It runs as a background service on port 5013, loads the model once at startup, and handles parse requests via HTTP.
Starting the Service
python -m parsely_dip.engine.stanza_service
What Happens at Startup
- Tries to load
default_accurate(transformer-based, best accuracy) - If that fails (missing packages), prompts the user to install or continue with standard
- Falls back to
default(CharLM-based, solid accuracy) - If no model found, prints install instructions and exits
- Reports GPU status (name of GPU if available, install command if not)
Service Endpoints
| Endpoint | Method | Description |
|---|---|---|
/process_syntactic_parsing |
POST | Parse text, return words with POS/dependency/constituency |
/debug_parse |
POST | Raw parse data for debugging sentence structure |
Interactive Mode
python -m parsely_dip.engine.stanza_service --chat
Opens an interactive prompt where you can type sentences and see their constituency trees and dependency relations. Useful for building new NLP patterns.
Security
- Localhost only (127.0.0.1) — rejects non-local requests
- Optional token auth via
STANZA_API_TOKENenvironment variable — enforced if set, skipped if not
NLP Pattern Specification
NLP patterns define grammatical structures that map to intents. Unlike regex (exact string matching), NLP patterns match on linguistic features extracted by Stanza.
Pattern Structure
{
"intent": "intent_name",
"nlp": {
"sentence_type": "SBARQ",
"words": [
{
"word": "exact_word",
"lemma": "base_form",
"pos": "NOUN",
"dep": "nsubj",
"head_lemma": "parent_word",
"required": true
}
]
}
}
Matching Modes
- Exact Word Match —
wordspecified: match that exact word in that grammatical position - Structural Match (Slot) —
wordempty: match ANY word with specified POS + dependency features - Optional Words —
required: false: pattern matches with or without this word
Supported Values
Sentence Types: S, SBARQ, SQ, SINV, FRAG (+ 20 more constituency labels)
POS Tags (17 Universal): NOUN, VERB, AUX, ADJ, ADV, PRON, DET, ADP, NUM, PART, CCONJ, SCONJ, INTJ, PROPN, PUNCT, SYM, X
Dependency Relations (37+): nsubj, obj, root, det, cop, aux, mark, case, advmod, amod, compound, conj, cc, xcomp, ccomp, advcl, acl, nmod, obl, nummod, appos, dep, fixed, flat, list, parataxis, orphan, goeswith, reparandum, punct, clf, discourse, dislocated, expl, iobj, vocative, csubj
Specificity Rule
A loose pattern that matches incorrectly is WORSE than no pattern (LLM fallback).
Every NLP pattern must be maximally specific. Include all words that disambiguate the intent — articles, pronouns, structural words. If removing a word would cause false positives, that word is required.
Configuration
.env
WEATHER_API_KEY=your_openweathermap_key
STANZA_API_TOKEN=optional_security_token
pyproject.toml Dependencies
dependencies = [
"stanza>=1.5",
"requests>=2.28",
"python-dotenv>=1.0",
"flask>=3.0",
]
Optional (for default_accurate model):
pip install transformers sentencepiece
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130
Requirements
- Python 3.9+
- Stanza 1.5+ (for NLP layer)
- Flask 3.0+ (for NLP service)
- A dedicated GPU is recommended but not required (RTX 3060+ for transformer models)
- The RegEx layer works with zero dependencies beyond the base package
Target Audience
Linguists and NLP researchers who understand constituency trees, dependency relations, and POS tags. You can run commands and follow instructions, but you should not have to debug import errors or port conflicts. PARSELY-DIP tells you what's wrong and how to fix it.
Status
v0.0.1 — Core engine built. RegEx pipeline working with time, weather, and scrum card intents. NLP layer ported from Uni with Stanza service (default_accurate with Electra Large transformer, GPU accelerated). Hook integration tested with Claude Code. CLI available via parsely command.
License
Source-available. Personal and development use permitted.
Author
George Butiri — george@iseestudios.com
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parsely_dip-0.0.1.tar.gz.
File metadata
- Download URL: parsely_dip-0.0.1.tar.gz
- Upload date:
- Size: 22.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
435349b42a92fba1e6c5ba8abe823604265f79a3f2aa2d7d9dfcfde61b400a64
|
|
| MD5 |
709ea79f8fcf7f63d44a420cbb4de01e
|
|
| BLAKE2b-256 |
50c4fa994a935d0d43a7fff37887760fd7362c7fe282489daa168fa633b65705
|
File details
Details for the file parsely_dip-0.0.1-py3-none-any.whl.
File metadata
- Download URL: parsely_dip-0.0.1-py3-none-any.whl
- Upload date:
- Size: 19.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ddcd76f8a004b8f15bd4db171392daff19e39aab8eb2366d708c748f725e2cdb
|
|
| MD5 |
d5f7fcee132abaffcbb7cd07b1dbb2dd
|
|
| BLAKE2b-256 |
2178e58cdfed474cad5895c5c7802ee7b89b7dd41d9cb1c1c5b8e451d2493527
|