Skip to main content

PARSELY-DIP: Deterministic Intent Parser — RegEx and NLP pipeline for intent recognition

Project description

PARSELY-DIP

Parsing And RegEx Syntactic Engine with Linguistic Yield — Deterministic Intent Parser

Parsely dip for silicon chips.

A deterministic intent recognition engine that processes natural language through a cascading pipeline — RegEx first, then constituency and dependency parsing via Stanza, then LLM fallback. Each layer only fires if the one above didn't match. The cheapest, fastest layer runs first. The LLM is the last resort, not the default.

Your LLM is expensive, slow, and unpredictable. When a user says "what time is it" or "move the card to done," there is zero ambiguity. A regex handles it in microseconds. An LLM spends tokens guessing what you already know. PARSELY-DIP intercepts deterministic commands before they reach the LLM, executes them directly, and returns the result.

What It Does

from parsely_dip import parse

result = parse("what time is it")
# result = "14:32"

result = parse("what is the weather like")
# result = "It's 36°F and broken clouds in Cleveland."

result = parse("tell me about quantum physics")
# result = None  (no match — pass to LLM)

One call. One input. Response string or None.

Install

pip install parsely-dip

From source:

git clone https://github.com/gbutiri/parsely-dip.git
cd parsely-dip
pip install -e .

NLP Layer Setup (Optional)

The RegEx layer works out of the box. The NLP layer requires Stanza and a running parse service.

1. Download the Stanza English model (~526MB):

python -c "import stanza; stanza.download('en')"

2. (Recommended) Download the accurate model with transformer support:

python -c "import stanza; stanza.download('en', package='default_accurate')"
pip install transformers sentencepiece

The default_accurate model uses PEFT fine-tuned transformers (Google Electra Large). The biggest accuracy improvement is in constituency parsing — the core of NLP intent matching. Requires ~1-2GB extra VRAM on a dedicated GPU.

3. (Recommended) Install PyTorch with GPU support:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130

Without this, Stanza runs on CPU. With a dedicated GPU (RTX 3060+), parsing is near-instant.

4. Start the NLP service:

python -m parsely_dip.engine.stanza_service

The service loads once and stays running. PARSELY-DIP calls it via HTTP on port 5013 for each query that passes the RegEx layer. The service auto-detects the best available model (default_accurate > default) and reports GPU status on startup.


Three-Tier Pipeline

User Input
    |
    v
[RegEx Layer]  — Pattern matching, microseconds, zero dependencies
    |  match? --> handler executes, returns response
    |  no match? --> continue
    v
[NLP Layer]    — Stanza constituency + dependency parsing via HTTP service
    |  match? --> handler executes, returns response
    |  no match? --> continue
    v
[LLM Fallback] — parse() returns None, caller decides what to do

Layer 1: RegEx

Patterns stored in flat .patterns text files. One pattern per line. No JSON escaping nightmares.

# Format: (regex) => intent_name
# intents/time.py
(what('s|\s+is)\s+the\s+time|what\s+time\s+is\s+it)\?? => tell_time

# intents/weather.py
((what|how)('s|\s+is)\s+the\s+weather(\s+like)?)\?? => tell_weather

# intents/scrum.py
show(\s+me|\s+us)?\s+the\s+(current|active)(\s+scrum)?\s+cards?[.!]? => show_current_card

Pattern convention: \s+ goes BEFORE the word it separates, not after.

CORRECT: (what('s|\s+is)\s+the\s+time)
WRONG:   (what('s|is\s+)the\s+time\s+)

The space belongs to the approach of the next word, not trailing from the previous.

Layer 2: NLP

Patterns stored in .json files. Each pattern defines a grammatical structure using sentence type, POS tags, dependency relations, and head words. Matches on linguistic features, not exact strings — so "what time is it, please?" and "hey, what's the time right now?" both match without needing separate regex patterns.

[
  {
    "intent": "tell_time",
    "nlp": {
      "sentence_type": ["SBARQ", "SQ", "WHNP"],
      "words": [
        {"word": "what", "pos": "DET", "dep": "det", "required": true},
        {"lemma": "time", "pos": "NOUN", "required": true},
        {"lemma": "be", "pos": "AUX", "dep": "cop", "required": true},
        {"word": "it", "pos": "PRON", "dep": "nsubj", "required": true}
      ]
    }
  }
]

The NLP layer requires the Stanza service running on port 5013. If the service is not running, the NLP layer is silently skipped and the pipeline falls through to LLM.

Layer 3: LLM Fallback

parse() returns None. The caller decides what to do — send to an LLM, show an error, or ignore. PARSELY-DIP does not call any LLM itself.


Intent Handlers

Self-registering via the @intent decorator. Import the module, the decorator registers the handler. No config files, no setup step.

from parsely_dip.engine.registry import intent

@intent('tell_time')
def tell_time():
    from datetime import datetime
    now = datetime.now()
    return f"{now.hour:02d}:{now.minute:02d}"

Built-in Intents

Intent File What It Does
tell_time intents/time.py Returns current time in 24-hour format
tell_weather intents/weather.py Returns weather via OpenWeatherMap API (requires WEATHER_API_KEY in .env)
show_current_card intents/scrum.py Shows active scrum cards from SQLite database
read_current_card intents/scrum.py Same data as show, but intended for LLM to summarize

Adding New Intents

  1. Create a new file in intents/ (e.g., intents/greeting.py)
  2. Write a handler function with the @intent decorator
  3. Add regex patterns to patterns/base.patterns
  4. (Optional) Add NLP patterns to patterns/base_nlp.json
  5. Import the module in __init__.py

Project Structure

parsely-dip/
  pyproject.toml           — Package config, dependencies
  README.md                — This file
  env_parselydip/          — Virtual environment
  db/                      — Database files (if needed by intents)
  logs/                    — Log files
  tests/                   — Test suite
  src/parsely_dip/
    __init__.py            — parse(prompt) single entry point
    engine/
      registry.py          — @intent decorator, handler registry, dispatch()
      regex.py             — load_patterns(), check_regex()
      nlp.py               — load_nlp_patterns(), check_nlp(), match_nlp_pattern()
      splitter.py          — Sentence splitting (future expansion)
      stanza_service.py    — Stanza NLP Flask service (port 5013)
    intents/
      __init__.py           — Auto-imports all intent modules
      time.py               — tell_time handler
      weather.py            — tell_weather handler (OpenWeatherMap API)
      scrum.py              — show_current_card, read_current_card handlers
    patterns/
      base.patterns         — RegEx patterns (flat text, one per line)
      base_nlp.json         — NLP patterns (structured JSON)
    cli/
      __init__.py           — CLI entry point (future)

Hook Integration

PARSELY-DIP is designed to run as a Claude Code UserPromptSubmit hook. The hook intercepts the user's message, runs it through the pipeline, and either handles it deterministically or lets the LLM process it.

Hook Script

#!/bin/bash
PROJECT_DIR="${CLAUDE_PROJECT_DIR:-.}"
VENV_PY="$PROJECT_DIR/env_bibliotech/Scripts/python.exe"
[ ! -f "$VENV_PY" ] && exit 0

"$VENV_PY" -c "
import sys, json
from parsely_dip import parse
data = json.load(sys.stdin)
prompt = data.get('prompt', '')
if prompt:
    r = parse(prompt)
    if r:
        print('=== PARSELY-DIP ===')
        print('Relay this to the user EXACTLY as written, nothing else:')
        print(r)
        print('=== END PARSELY-DIP ===')
" 2>/dev/null
exit 0

How It Works

  1. Hook reads the user's prompt from stdin (JSON with prompt field)
  2. Calls parsely_dip.parse(prompt)
  3. If result: prints it to stdout (shown to LLM as context, LLM relays verbatim)
  4. If None: no output, LLM processes the prompt normally

Known Limitation

Claude Code's UserPromptSubmit hooks cannot display text directly to the user without the LLM firing. The documented decision: "block" + reason field blocks the prompt but does not render the reason in the VS Code extension (confirmed bug). The current approach uses plain text stdout with exit 0 — the LLM sees the result and relays it.


Stanza NLP Service

The NLP service is a Flask app that wraps Stanford's Stanza NLP library. It runs as a background service on port 5013, loads the model once at startup, and handles parse requests via HTTP.

Starting the Service

python -m parsely_dip.engine.stanza_service

What Happens at Startup

  1. Tries to load default_accurate (transformer-based, best accuracy)
  2. If that fails (missing packages), prompts the user to install or continue with standard
  3. Falls back to default (CharLM-based, solid accuracy)
  4. If no model found, prints install instructions and exits
  5. Reports GPU status (name of GPU if available, install command if not)

Service Endpoints

Endpoint Method Description
/process_syntactic_parsing POST Parse text, return words with POS/dependency/constituency
/debug_parse POST Raw parse data for debugging sentence structure

Interactive Mode

python -m parsely_dip.engine.stanza_service --chat

Opens an interactive prompt where you can type sentences and see their constituency trees and dependency relations. Useful for building new NLP patterns.

Security

  • Localhost only (127.0.0.1) — rejects non-local requests
  • Optional token auth via STANZA_API_TOKEN environment variable — enforced if set, skipped if not

NLP Pattern Specification

NLP patterns define grammatical structures that map to intents. Unlike regex (exact string matching), NLP patterns match on linguistic features extracted by Stanza.

Pattern Structure

{
  "intent": "intent_name",
  "nlp": {
    "sentence_type": "SBARQ",
    "words": [
      {
        "word": "exact_word",
        "lemma": "base_form",
        "pos": "NOUN",
        "dep": "nsubj",
        "head_lemma": "parent_word",
        "required": true
      }
    ]
  }
}

Matching Modes

  • Exact Word Matchword specified: match that exact word in that grammatical position
  • Structural Match (Slot)word empty: match ANY word with specified POS + dependency features
  • Optional Wordsrequired: false: pattern matches with or without this word

Supported Values

Sentence Types: S, SBARQ, SQ, SINV, FRAG (+ 20 more constituency labels)

POS Tags (17 Universal): NOUN, VERB, AUX, ADJ, ADV, PRON, DET, ADP, NUM, PART, CCONJ, SCONJ, INTJ, PROPN, PUNCT, SYM, X

Dependency Relations (37+): nsubj, obj, root, det, cop, aux, mark, case, advmod, amod, compound, conj, cc, xcomp, ccomp, advcl, acl, nmod, obl, nummod, appos, dep, fixed, flat, list, parataxis, orphan, goeswith, reparandum, punct, clf, discourse, dislocated, expl, iobj, vocative, csubj

Specificity Rule

A loose pattern that matches incorrectly is WORSE than no pattern (LLM fallback).

Every NLP pattern must be maximally specific. Include all words that disambiguate the intent — articles, pronouns, structural words. If removing a word would cause false positives, that word is required.


Configuration

.env

WEATHER_API_KEY=your_openweathermap_key
STANZA_API_TOKEN=optional_security_token

pyproject.toml Dependencies

dependencies = [
    "stanza>=1.5",
    "requests>=2.28",
    "python-dotenv>=1.0",
    "flask>=3.0",
]

Optional (for default_accurate model):

pip install transformers sentencepiece
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130

Requirements

  • Python 3.9+
  • Stanza 1.5+ (for NLP layer)
  • Flask 3.0+ (for NLP service)
  • A dedicated GPU is recommended but not required (RTX 3060+ for transformer models)
  • The RegEx layer works with zero dependencies beyond the base package

Target Audience

Linguists and NLP researchers who understand constituency trees, dependency relations, and POS tags. You can run commands and follow instructions, but you should not have to debug import errors or port conflicts. PARSELY-DIP tells you what's wrong and how to fix it.

Status

v0.0.1 — Core engine built. RegEx pipeline working with time, weather, and scrum card intents. NLP layer ported from Uni with Stanza service (default_accurate with Electra Large transformer, GPU accelerated). Hook integration tested with Claude Code. CLI available via parsely command.

License

Source-available. Personal and development use permitted.

Author

George Butiri — george@iseestudios.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsely_dip-0.0.1.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parsely_dip-0.0.1-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file parsely_dip-0.0.1.tar.gz.

File metadata

  • Download URL: parsely_dip-0.0.1.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for parsely_dip-0.0.1.tar.gz
Algorithm Hash digest
SHA256 435349b42a92fba1e6c5ba8abe823604265f79a3f2aa2d7d9dfcfde61b400a64
MD5 709ea79f8fcf7f63d44a420cbb4de01e
BLAKE2b-256 50c4fa994a935d0d43a7fff37887760fd7362c7fe282489daa168fa633b65705

See more details on using hashes here.

File details

Details for the file parsely_dip-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: parsely_dip-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for parsely_dip-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ddcd76f8a004b8f15bd4db171392daff19e39aab8eb2366d708c748f725e2cdb
MD5 d5f7fcee132abaffcbb7cd07b1dbb2dd
BLAKE2b-256 2178e58cdfed474cad5895c5c7802ee7b89b7dd41d9cb1c1c5b8e451d2493527

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page