PARSELY-DIP: Deterministic Intent Parser — RegEx and NLP pipeline for intent recognition

These details have not been verified by PyPI

Project links

Homepage

Project description

PARSELY-DIP

Parsing And RegEx Syntactic Engine with Linguistic Yield — Deterministic Intent Parser

Parsely dip for silicon chips.

A deterministic intent recognition engine that processes natural language through a cascading pipeline — RegEx first, then constituency and dependency parsing via Stanza, then LLM fallback. Each layer only fires if the one above didn't match. The cheapest, fastest layer runs first. The LLM is the last resort, not the default.

Your LLM is expensive, slow, and unpredictable. When a user says "what time is it" or "move the card to done," there is zero ambiguity. A regex handles it in microseconds. An LLM spends tokens guessing what you already know. PARSELY-DIP intercepts deterministic commands before they reach the LLM, executes them directly, and returns the result.

What It Does

from parsely_dip import parse

result = parse("what time is it")
# result = "14:32"

result = parse("what is the weather like")
# result = "It's 36°F and broken clouds in Cleveland."

result = parse("tell me about quantum physics")
# result = None  (no match — pass to LLM)

One call. One input. Response string or None.

Install

pip install parsely-dip

From source:

git clone https://github.com/gbutiri/parsely-dip.git
cd parsely-dip
pip install -e .

NLP Layer Setup (Optional)

The RegEx layer works out of the box. The NLP layer requires Stanza and a running parse service.

1. Download the Stanza English model (~526MB):

python -c "import stanza; stanza.download('en')"

2. (Recommended) Download the accurate model with transformer support:

python -c "import stanza; stanza.download('en', package='default_accurate')"
pip install transformers sentencepiece

The default_accurate model uses PEFT fine-tuned transformers (Google Electra Large). The biggest accuracy improvement is in constituency parsing — the core of NLP intent matching. Requires ~1-2GB extra VRAM on a dedicated GPU.

3. (Recommended) Install PyTorch with GPU support:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130

Without this, Stanza runs on CPU. With a dedicated GPU (RTX 3060+), parsing is near-instant.

4. Start the NLP service:

python -m parsely_dip.engine.stanza_service

The service loads once and stays running. PARSELY-DIP calls it via HTTP on port 5013 for each query that passes the RegEx layer. The service auto-detects the best available model (default_accurate > default) and reports GPU status on startup.

Three-Tier Pipeline

User Input
    |
    v
[RegEx Layer]  — Pattern matching, microseconds, zero dependencies
    |  match? --> handler executes, returns response
    |  no match? --> continue
    v
[NLP Layer]    — Stanza constituency + dependency parsing via HTTP service
    |  match? --> handler executes, returns response
    |  no match? --> continue
    v
[LLM Fallback] — parse() returns None, caller decides what to do

Layer 1: RegEx

Patterns stored in flat .patterns text files. One pattern per line. No JSON escaping nightmares.

# Format: (regex) => intent_name
# intents/time.py
(what('s|\s+is)\s+the\s+time|what\s+time\s+is\s+it)\?? => tell_time

# intents/weather.py
((what|how)('s|\s+is)\s+the\s+weather(\s+like)?)\?? => tell_weather

# intents/scrum.py
show(\s+me|\s+us)?\s+the\s+(current|active)(\s+scrum)?\s+cards?[.!]? => show_current_card

Pattern convention: \s+ goes BEFORE the word it separates, not after.

CORRECT: (what('s|\s+is)\s+the\s+time)
WRONG:   (what('s|is\s+)the\s+time\s+)

The space belongs to the approach of the next word, not trailing from the previous.

Layer 2: NLP

Patterns stored in .json files. Each pattern defines a grammatical structure using sentence type, POS tags, dependency relations, and head words. Matches on linguistic features, not exact strings — so "what time is it, please?" and "hey, what's the time right now?" both match without needing separate regex patterns.

[
  {
    "intent": "tell_time",
    "nlp": {
      "sentence_type": ["SBARQ", "SQ", "WHNP"],
      "words": [
        {"word": "what", "pos": "DET", "dep": "det", "required": true},
        {"lemma": "time", "pos": "NOUN", "required": true},
        {"lemma": "be", "pos": "AUX", "dep": "cop", "required": true},
        {"word": "it", "pos": "PRON", "dep": "nsubj", "required": true}
      ]
    }
  }
]

The NLP layer requires the Stanza service running on port 5013. If the service is not running, the NLP layer is silently skipped and the pipeline falls through to LLM.

Layer 3: LLM Fallback

parse() returns None. The caller decides what to do — send to an LLM, show an error, or ignore. PARSELY-DIP does not call any LLM itself.

Intent Handlers

Self-registering via the @intent decorator. Import the module, the decorator registers the handler. No config files, no setup step.

from parsely_dip.engine.registry import intent

@intent('tell_time')
def tell_time():
    from datetime import datetime
    now = datetime.now()
    return f"{now.hour:02d}:{now.minute:02d}"

Built-in Intents

Intent	File	What It Does
`tell_time`	`intents/time.py`	Returns current time in 24-hour format
`tell_weather`	`intents/weather.py`	Returns weather via OpenWeatherMap API (requires `WEATHER_API_KEY` in `.env`)
`show_current_card`	`intents/scrum.py`	Shows active scrum cards from SQLite database
`read_current_card`	`intents/scrum.py`	Same data as show, but intended for LLM to summarize

Adding New Intents

Create a new file in intents/ (e.g., intents/greeting.py)
Write a handler function with the @intent decorator
Add regex patterns to patterns/base.patterns
(Optional) Add NLP patterns to patterns/base_nlp.json
Import the module in __init__.py

Project Structure

parsely-dip/
  pyproject.toml           — Package config, dependencies
  README.md                — This file
  env_parselydip/          — Virtual environment
  db/                      — Database files (if needed by intents)
  logs/                    — Log files
  tests/                   — Test suite
  src/parsely_dip/
    __init__.py            — parse(prompt) single entry point
    engine/
      registry.py          — @intent decorator, handler registry, dispatch()
      regex.py             — load_patterns(), check_regex()
      nlp.py               — load_nlp_patterns(), check_nlp(), match_nlp_pattern()
      splitter.py          — Sentence splitting (future expansion)
      stanza_service.py    — Stanza NLP Flask service (port 5013)
    intents/
      __init__.py           — Auto-imports all intent modules
      time.py               — tell_time handler
      weather.py            — tell_weather handler (OpenWeatherMap API)
      scrum.py              — show_current_card, read_current_card handlers
    patterns/
      base.patterns         — RegEx patterns (flat text, one per line)
      base_nlp.json         — NLP patterns (structured JSON)
    cli/
      __init__.py           — CLI entry point (future)

Hook Integration

PARSELY-DIP is designed to run as a Claude Code UserPromptSubmit hook. The hook intercepts the user's message, runs it through the pipeline, and either handles it deterministically or lets the LLM process it.

Hook Script

#!/bin/bash
PROJECT_DIR="${CLAUDE_PROJECT_DIR:-.}"
VENV_PY="$PROJECT_DIR/env_bibliotech/Scripts/python.exe"
[ ! -f "$VENV_PY" ] && exit 0

"$VENV_PY" -c "
import sys, json
from parsely_dip import parse
data = json.load(sys.stdin)
prompt = data.get('prompt', '')
if prompt:
    r = parse(prompt)
    if r:
        print('=== PARSELY-DIP ===')
        print('Relay this to the user EXACTLY as written, nothing else:')
        print(r)
        print('=== END PARSELY-DIP ===')
" 2>/dev/null
exit 0

How It Works

Hook reads the user's prompt from stdin (JSON with prompt field)
Calls parsely_dip.parse(prompt)
If result: prints it to stdout (shown to LLM as context, LLM relays verbatim)
If None: no output, LLM processes the prompt normally

Known Limitation

Claude Code's UserPromptSubmit hooks cannot display text directly to the user without the LLM firing. The documented decision: "block" + reason field blocks the prompt but does not render the reason in the VS Code extension (confirmed bug). The current approach uses plain text stdout with exit 0 — the LLM sees the result and relays it.

Stanza NLP Service

The NLP service is a Flask app that wraps Stanford's Stanza NLP library. It runs as a background service on port 5013, loads the model once at startup, and handles parse requests via HTTP.

Starting the Service

python -m parsely_dip.engine.stanza_service

What Happens at Startup

Tries to load default_accurate (transformer-based, best accuracy)
If that fails (missing packages), prompts the user to install or continue with standard
Falls back to default (CharLM-based, solid accuracy)
If no model found, prints install instructions and exits
Reports GPU status (name of GPU if available, install command if not)

Service Endpoints

Endpoint	Method	Description
`/process_syntactic_parsing`	POST	Parse text, return words with POS/dependency/constituency
`/debug_parse`	POST	Raw parse data for debugging sentence structure

Interactive Mode

python -m parsely_dip.engine.stanza_service --chat

Opens an interactive prompt where you can type sentences and see their constituency trees and dependency relations. Useful for building new NLP patterns.

Security

Localhost only (127.0.0.1) — rejects non-local requests
Optional token auth via STANZA_API_TOKEN environment variable — enforced if set, skipped if not

NLP Pattern Specification

NLP patterns define grammatical structures that map to intents. Unlike regex (exact string matching), NLP patterns match on linguistic features extracted by Stanza.

Pattern Structure

{
  "intent": "intent_name",
  "nlp": {
    "sentence_type": "SBARQ",
    "words": [
      {
        "word": "exact_word",
        "lemma": "base_form",
        "pos": "NOUN",
        "dep": "nsubj",
        "head_lemma": "parent_word",
        "required": true
      }
    ]
  }
}

Matching Modes

Exact Word Match — word specified: match that exact word in that grammatical position
Structural Match (Slot) — word empty: match ANY word with specified POS + dependency features
Optional Words — required: false: pattern matches with or without this word

Supported Values

Sentence Types: S, SBARQ, SQ, SINV, FRAG (+ 20 more constituency labels)

POS Tags (17 Universal): NOUN, VERB, AUX, ADJ, ADV, PRON, DET, ADP, NUM, PART, CCONJ, SCONJ, INTJ, PROPN, PUNCT, SYM, X

Dependency Relations (37+): nsubj, obj, root, det, cop, aux, mark, case, advmod, amod, compound, conj, cc, xcomp, ccomp, advcl, acl, nmod, obl, nummod, appos, dep, fixed, flat, list, parataxis, orphan, goeswith, reparandum, punct, clf, discourse, dislocated, expl, iobj, vocative, csubj

Specificity Rule

A loose pattern that matches incorrectly is WORSE than no pattern (LLM fallback).

Every NLP pattern must be maximally specific. Include all words that disambiguate the intent — articles, pronouns, structural words. If removing a word would cause false positives, that word is required.

Configuration

.env

WEATHER_API_KEY=your_openweathermap_key
STANZA_API_TOKEN=optional_security_token

pyproject.toml Dependencies

dependencies = [
    "stanza>=1.5",
    "requests>=2.28",
    "python-dotenv>=1.0",
    "flask>=3.0",
]

Optional (for default_accurate model):

pip install transformers sentencepiece
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130

Requirements

Python 3.9+
Stanza 1.5+ (for NLP layer)
Flask 3.0+ (for NLP service)
A dedicated GPU is recommended but not required (RTX 3060+ for transformer models)
The RegEx layer works with zero dependencies beyond the base package

Target Audience

Linguists and NLP researchers who understand constituency trees, dependency relations, and POS tags. You can run commands and follow instructions, but you should not have to debug import errors or port conflicts. PARSELY-DIP tells you what's wrong and how to fix it.

Status

v0.0.1 — Core engine built. RegEx pipeline working with time, weather, and scrum card intents. NLP layer ported from Uni with Stanza service (default_accurate with Electra Large transformer, GPU accelerated). Hook integration tested with Claude Code. CLI available via parsely command.

License

Source-available. Personal and development use permitted.

Author

George Butiri — george@iseestudios.com

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.6

Apr 23, 2026

0.0.5

Apr 23, 2026

0.0.4

Mar 29, 2026

0.0.3

Mar 29, 2026

0.0.2

Mar 29, 2026

This version

0.0.1

Mar 28, 2026

0.0.0

Mar 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsely_dip-0.0.1.tar.gz (22.5 kB view details)

Uploaded Mar 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

parsely_dip-0.0.1-py3-none-any.whl (19.3 kB view details)

Uploaded Mar 28, 2026 Python 3

File details

Details for the file parsely_dip-0.0.1.tar.gz.

File metadata

Download URL: parsely_dip-0.0.1.tar.gz
Upload date: Mar 28, 2026
Size: 22.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for parsely_dip-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`435349b42a92fba1e6c5ba8abe823604265f79a3f2aa2d7d9dfcfde61b400a64`
MD5	`709ea79f8fcf7f63d44a420cbb4de01e`
BLAKE2b-256	`50c4fa994a935d0d43a7fff37887760fd7362c7fe282489daa168fa633b65705`

See more details on using hashes here.

File details

Details for the file parsely_dip-0.0.1-py3-none-any.whl.

File metadata

Download URL: parsely_dip-0.0.1-py3-none-any.whl
Upload date: Mar 28, 2026
Size: 19.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for parsely_dip-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ddcd76f8a004b8f15bd4db171392daff19e39aab8eb2366d708c748f725e2cdb`
MD5	`d5f7fcee132abaffcbb7cd07b1dbb2dd`
BLAKE2b-256	`2178e58cdfed474cad5895c5c7802ee7b89b7dd41d9cb1c1c5b8e451d2493527`

See more details on using hashes here.

parsely-dip 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PARSELY-DIP

What It Does

Install

NLP Layer Setup (Optional)

Three-Tier Pipeline

Layer 1: RegEx

Layer 2: NLP

Layer 3: LLM Fallback

Intent Handlers

Built-in Intents

Adding New Intents

Project Structure

Hook Integration

Hook Script

How It Works

Known Limitation

Stanza NLP Service

Starting the Service

What Happens at Startup

Service Endpoints

Interactive Mode

Security

NLP Pattern Specification

Pattern Structure

Matching Modes

Supported Values

Specificity Rule

Configuration

.env

pyproject.toml Dependencies

Requirements

Target Audience

Status

License

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes