Skip to main content

Runtime semiformal system: @semiformal and semi() for semantically underspecified logic

Project description

semipy

Write Python logic in natural language. semipy generates, validates, and caches real Python functions from your specs — no boilerplate, no hardcoded rules.

from semipy import semiformal, semi, configure

configure(openai_api_key="sk-...")  # or set OPENAI_API_KEY in environment

@semiformal
def parse_log_line(line: str) -> dict:
    #> extract timestamp, level, and message from an Apache log line
    result = ...
    return result

# First call: agent generates a Python function and caches it.
# All later calls: load the cached implementation instantly.
row = parse_log_line('127.0.0.1 - - [10/Oct/2000:13:55:36] "GET /index.html" 200')

Install

pip install semipy

For Jupyter notebook display:

pip install "semipy[jupyter]"

For PDF input materialization:

pip install "semipy[pdf]"

API key

semipy uses the OpenAI API for code generation. Set your key in one of three ways:

Environment variable (recommended for scripts):

export OPENAI_API_KEY=sk-...

.env file at the project root (loaded automatically):

OPENAI_API_KEY=sk-...

In code (overrides env):

from semipy import configure
configure(openai_api_key="sk-...")

Configuration

from semipy import configure

configure(
    openai_api_key="sk-...",       # defaults to OPENAI_API_KEY env var
    openai_model="gpt-4o",         # generation model (default: gpt-5.4)
    verbose=True,                  # rich terminal output during generation (default: True)
    cache_dir=".semiformal",       # where portal JSON and dispatch modules are stored
    max_retries=3,                 # agent retry limit on validation failure
    session_source=None,           # pin portal identity (useful for Jupyter; see below)
)

All fields are optional — call configure() with only what you want to override.

Terminal output

When verbose=True (the default), semipy prints a live Rich panel showing the agent's reasoning, tool calls, and generated code as they stream in. This works in both terminal and Jupyter.

 Implementing code...
 ─────────────────────────────────────────────────
  Reasoning  The function needs to parse a standard Apache
             Combined Log Format line...
 ─────────────────────────────────────────────────
  Tool  build_and_run_gist  passed
 ─────────────────────────────────────────────────
  Reusing cached implementation; runtime verify passed.
  parse_log_line  GENERATE  a1b2c3d4  examples/logs.py:12

To silence all output: configure(verbose=False).

To see full prompt, decision, and tool-call dumps, set the environment variable:

export SEMIPY_PIPELINE_TRACE=1

Usage patterns

@semiformal with #> spec blocks

from semipy import semiformal

@semiformal
def extract_fields(record: str) -> dict:
    #> extract date, sender, and subject from an email header
    result = ...
    return result

The #> block is the spec. The result = ... is the slot anchor — the agent fills it in.

Inline semi() for expressions

from semipy import semiformal, semi

@semiformal
def classify_rows(rows):
    return [r for r in rows if semi(f"is {repr(r['status'])} a client error?")]

Standalone semi() in any function

from semipy import semi

def process(text):
    label = semi(f"classify '{text}' as positive, negative, or neutral")
    return label

Multiple slots in one function

from semipy import semiformal

@semiformal
def analyze(entry: str) -> dict:
    #> extract the IP address from the log entry
    ip = ...

    #> determine if the HTTP status code in the entry indicates an error
    is_error = ...

    return {"ip": ip, "error": is_error}

Reasoning surface (#< lines)

After each generation, semipy writes a small set of #< comment lines around each slot anchor. These are not part of your spec — they are system-managed traces that describe what the generated implementation decided and why, so you can read and steer it without digging into the cache.

@semiformal
def infer_datetime_formatter(date_str: str) -> str:
    #< intent: infer strptime pattern from observed date text
    #< by: probing a regex-gated candidate table; because the table covers all observed formats
    #< unless: empty or unmatched input raises ValueError
    #> verified: 'Mar 2025' -> {'input_pattern':'%b %Y'}, return error if no match
    input_pattern = ... #> infer the input date regex/strptime pattern from the observed string format in this session.
    output_pattern = "%b %Y"
    return datetime.strptime(str(date_str), input_pattern).strftime(output_pattern)

Placement

#< lines appear in two zones around the slot anchor:

  • Zone P (provenance, above the anchor): intent, given, by, unless — why this implementation exists, what it assumes, and how it handles failure.
  • Zone E (effect, below the anchor): yields, verified — what the generated code produces and what was observed at runtime.

Keywords

Key Zone Meaning
intent above One-phrase task summary (emitted only when the spec is long or ambiguous).
given above Input-shape assumptions beyond the signature (multi-param slots only).
by above Strategy/mechanism this implementation uses. Embed the reason inline when the choice is non-obvious: <strategy>; because <reason>. Always present.
unless above Fallback or exceptional path (emitted only when the generated code has a raise/except). May repeat for distinct failure modes.
yields below Output shape beyond the return annotation (skipped for simple builtins like str, int).
verified below Sample input → observed output, derived from the validation run (never LLM-generated).

Steering

To change what the next generation produces, edit the #< by: line (strategy) or the #< unless: line (exceptional path). On the next run where the implementation needs to change, semipy reads the override and adapts accordingly.

Promoting a constraint

To lock an inference note into the contract permanently, flip #< to #> on the same line. This extends the spec text, causing a new ADAPT on the next run, and suppresses the duplicate #< line from being re-emitted.

# Before: system-managed trace
#< by: probing a regex-gated candidate table; because the table covers all observed formats

# After: promoted to user contract — fixes the strategy for future runs
#> by: regex-gated candidate table

#< lines are stable across runs: semipy only rewrites them when the generated implementation changes. A no-op re-run produces a byte-identical file.

Caching and reuse

Generated functions are stored in .semiformal/ relative to your working directory:

  • .semiformal/<session>.portal.json — versioned DAG of all commits, branches, and decisions.
  • .semiformal/runtime/<module>.semi.py — compiled Python implementations for import.

On subsequent runs, semipy loads the cached implementation without calling the LLM. It re-verifies when it detects new input shapes, and runs ADAPT (re-generation from the prior commit) when verification fails.

To force regeneration, delete .semiformal/ or the relevant portal file.

Jupyter

In Jupyter notebooks, semipy detects the ipykernel environment automatically. The portal is keyed to os.getcwd() so one portal persists across kernel restarts.

Install the optional display extras for inline Rich output:

pip install "semipy[jupyter]"

If multiple notebooks share a working directory and need separate caches:

configure(session_source="/path/to/my_notebook.ipynb")

VS Code extension

The Semipy VS Code extension adds:

  • Syntax highlighting for #> spec lines (teal) and #< reasoning lines (green)
  • Slot history tree in the Explorer panel
  • Split-view to inspect generated .semi.py alongside your source
  • Inlay hints and CodeLens showing commit id and decision

Examples

See the examples/ directory:

  • examples/apache_log_semiformal_stages.py — staged walkthrough from simple extraction to INSTANTIATE and promotion workflow
  • examples/datetime_test.py — datetime format inference with observed samples and steered #< surface
  • examples/fasta_header_metadata.py — FASTA header parsing with multiple slots, nested anchors, and standalone semi() calls
  • examples/use_contract_intelligence.py — contract field extraction from PDF
  • examples/use_sponsorship_canonicalizer.py — entity canonicalization pipeline

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semiformal_py-0.2.1.tar.gz (156.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semiformal_py-0.2.1-py3-none-any.whl (186.3 kB view details)

Uploaded Python 3

File details

Details for the file semiformal_py-0.2.1.tar.gz.

File metadata

  • Download URL: semiformal_py-0.2.1.tar.gz
  • Upload date:
  • Size: 156.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for semiformal_py-0.2.1.tar.gz
Algorithm Hash digest
SHA256 95384058f9bd18adea6a3c8453145e2411c0afa7d132e79d6ca9ae50606edf5c
MD5 fba3af6cd9be46c58878a19293985672
BLAKE2b-256 bf42479e9d71b5b75dffc28a432c8cb8ade5452d2d30be8bac92c58389655ddf

See more details on using hashes here.

File details

Details for the file semiformal_py-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: semiformal_py-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 186.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for semiformal_py-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c8c9a91d924bbe334ee9c5f10d419459fd5cc4114f805bace80126da35c77f6e
MD5 381b017394232794e858df99619e7cb1
BLAKE2b-256 3c3fcbf1b3d1bde9dbfa36229759caeb483fdb144a0aadc6df2f5cf3a6520a52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page