Security- and compliance-first CLI for AI-assisted legacy code analysis

These details have not been verified by PyPI

Project links

Project description

legacylens

Security- and compliance-first CLI for AI-assisted legacy code analysis.

legacylens ingests legacy/mainframe codebases and produces three co-equal outputs:

Dependency graphs — call/data/include relationships across artifacts.
Security & compliance findings — mapped to CWE / OWASP (extensible rule packs; see docs/RULES.md).
Modern documentation — human-readable explanations of legacy logic.

It is built to run on-prem / air-gapped with bring-your-own LLM — point it at your own self-hosted models or your own LLM API keys. No source code or telemetry leaves your environment.

Status: v1 feature-complete across batches B0–B7 (see REQUIREMENTS.md): scaffold, BYO-LLM gateway, ingestion/index, COBOL/JCL/PL-I parsing, dependency graph, CWE/OWASP security analysis, documentation, retrieval, and cost controls.

Installation

legacylens installs a single CLI command that works from cmd, PowerShell, or any terminal on Windows, macOS, and Linux.

Recommended — `pipx` (puts `legacylens` on PATH for every shell, all OSes)

python -m pip install --user pipx
python -m pipx ensurepath          # one-time: adds the CLI dir to PATH
pipx install legacylens            # from PyPI  (or:  pipx install .  from a clone)

Or use the bundled helper from a clone:

# Windows (PowerShell):
powershell -ExecutionPolicy Bypass -File scripts\install.ps1
# macOS / Linux:
bash scripts/install.sh

Open a new terminal afterwards (so PATH refreshes), then:

legacylens --help
legacylens doctor          # report Python + dependency status

On first run, legacylens verifies its required libraries are present; if any are missing (e.g. running from a clone without installing), it asks permission and installs them with pip (or set LEGACYLENS_AUTO_INSTALL=1 to consent non-interactively). legacylens doctor shows the same status on demand.

Alternative — `pip`

pip install legacylens             # or:  pip install .   from a clone

This also creates the legacylens command. On Windows, if the shell can't find it, your Python Scripts directory isn't on PATH — either use pipx (above), run py -m legacylens ..., or add ...\PythonXX\Scripts to PATH. pipx avoids this entirely.

Air-gapped / offline install

Build a self-contained wheel bundle on a networked machine, copy it to the air-gapped host, and install with no network:

bash scripts/build_offline_bundle.sh      # Windows: scripts\build_offline_bundle.ps1
# copy dist/wheelhouse/ to the target host, then:
pip install --no-index --find-links wheelhouse legacylens

Development (editable, with tests)

python -m venv .venv
. .venv/Scripts/activate        # Windows
# source .venv/bin/activate     # Linux/macOS
pip install -e ".[dev]"
pytest

Quick start

legacylens init                 # scaffold an audit.yaml in the current dir
# edit audit.yaml — set project.root, languages, and your LLM provider(s)
legacylens index                # discover & index sources (COBOL/JCL/PL-I)
legacylens analyze              # parse + security/compliance analysis
legacylens graph                # emit dependency graph (DOT/Mermaid/GraphML)
legacylens doc                  # generate documentation (Markdown + overview)
legacylens report               # render findings (SARIF/JSON/HTML)
legacylens embed                # build the semantic embedding index (BYO embeddings)
legacylens search "QUERY"       # find the most relevant artifacts

Once an embedding index is built (legacylens embed), the LLM steps are retrieval-augmented: analyze (security) and doc inject the most semantically related artifacts into their prompts for cross-file reasoning (disable with --no-rag).

Run legacylens --help for all commands and legacylens <cmd> --help for details.

Useful flags: --no-llm (on analyze/doc) runs fully deterministically with no model calls; budget.max_tokens in config caps total LLM spend per run.

Parse results are cached in the index (content-addressed), so unchanged files are parsed once and reused across passes, commands, and runs — incremental by default (parser.cache: true). For large estates, parse in parallel with -j <workers> (or parser.workers): cache-miss files are grammar-parsed across a process pool to warm the cache before analysis.

Configuration

All behavior is driven by a single config file (audit.yaml by default; override with -c/--config). Credentials are never stored in config — you name the environment variable holding each provider's key. See the file generated by legacylens init for the full, commented schema.

A fully-commented reference with every provider (local, OpenAI, Anthropic, Gemini, and any OpenAI-compatible endpoint) is at examples/audit.example.yaml — copy it, keep one provider, point routing.default at it, and export that provider's API key env var.

Key principles:

Air-gapped by default (air_gapped: true): the LLM gateway refuses any endpoint not explicitly listed under llm.providers.
Bring-your-own models: OpenAI-compatible, Anthropic, or local servers (Ollama / vLLM / llama.cpp), with per-task model routing.
Auditable: every run appends a structured trail to the configured audit log.

Choosing your LLM provider

Steps to enable an LLM:

Create llm_config.yaml next to your audit.yaml (copy examples/llm_config.example.yaml), and make sure audit.yaml has no llm: block.
Fill in type, url, model, and key for your provider (table below). Prefer api_key_env: NAME instead of key: to keep the key in an env var.
(Optional) add embedding_model: to enable embed/search + retrieval-augmented docs and security.

Run with the LLM on — i.e. without --no-llm:

legacylens index
legacylens analyze            # adds LLM advisory findings (flagged for review)
legacylens doc                # fills in Purpose / Business-logic prose
legacylens embed              # optional: build the embedding index for RAG

Everything else (parsing, graph, CWE/OWASP + regulatory findings, all output formats) works the same with or without an LLM.

Easiest — a 4-line llm_config.yaml. Create it next to your audit.yaml (and leave the llm: block out of audit.yaml); legacylens auto-detects it:

# llm_config.yaml   (this file is git-ignored — it may hold your key)
type: openai_compatible
url: https://generativelanguage.googleapis.com/v1beta/openai
model: gemini-2.0-flash
key: PASTE_YOUR_KEY_HERE     # or use `api_key_env: GEMINI_API_KEY` to keep it in an env var

That's it — run legacylens analyze / doc. Swap type/url/model for any provider (see examples/llm_config.example.yaml).

Advanced — a full llm: block (multiple providers, per-task routing). Here the API key is never in the config — you export it as an environment variable and name it via api_key_env. Pick one provider block for llm.providers, point routing.default at it, and export the key:

# PowerShell (Windows) — this session, or `setx NAME "value"` to persist:
$env:OPENAI_API_KEY = "sk-..."

# bash / Linux / macOS:
export OPENAI_API_KEY="sk-..."

Provider	`type`	`base_url`	`model` (example)	key env
Local (Ollama / vLLM / llama.cpp)	`local`	`http://localhost:11434/v1`	`qwen2.5-coder:32b`	— (offline)
OpenAI	`openai_compatible`	`https://api.openai.com/v1`	`gpt-4o-mini`	`OPENAI_API_KEY`
Anthropic (Claude)	`anthropic`	`https://api.anthropic.com`	`claude-sonnet-4-6`	`ANTHROPIC_API_KEY`
Google Gemini	`openai_compatible`	`https://generativelanguage.googleapis.com/v1beta/openai`	`gemini-2.0-flash`	`GEMINI_API_KEY`
Any OpenAI-compatible (Groq, Together, OpenRouter, LiteLLM…)	`openai_compatible`	your endpoint `/v1`	your model id	your env var

Example — Google Gemini (free key):

llm:
  providers:
    - name: gemini
      type: openai_compatible
      base_url: https://generativelanguage.googleapis.com/v1beta/openai
      model: gemini-2.0-flash
      api_key_env: GEMINI_API_KEY      # export GEMINI_API_KEY=... ; not stored here
  routing:
    default: gemini
  # optional — enables `embed`/`search` + retrieval-augmented docs & security:
  # embeddings: { provider: gemini, model: text-embedding-004 }

Then run without --no-llm (e.g. legacylens analyze, legacylens doc) to get LLM-assisted findings and documentation. The full set of provider blocks is in examples/audit.example.yaml.

Note: a cloud provider means code leaves your environment (the cloud host is an allowed endpoint even with air_gapped: true). For a strict on-prem engagement, use a local model.

Development

pytest                          # run the test suite

Findings lifecycle & CI gating

legacylens supports a real audit/CI workflow around findings:

legacylens analyze --fail-on high      # exit 6 if any non-suppressed finding >= high
legacylens suppress --list             # list findings with their (line-independent) fingerprints
legacylens suppress <fingerprint> --reason "false positive"   # accept / silence one
legacylens baseline                    # accept current findings as the baseline
legacylens diff                        # show findings new vs resolved since the baseline
legacylens analyze --fail-on high --new-only   # gate only on findings new vs the baseline

Suppressions (.legacylens/suppressions.json) mark false positives or accepted LLM-advisory findings; they're excluded from gating and shown struck-through in the HTML report and marked in SARIF (suppressions).
Baseline (.legacylens/baseline.json) lets you adopt legacylens on a large estate without drowning in pre-existing findings — gate only on what's new.
Exit codes: 6 = gate failure (distinct from tool errors), so CI can tell a policy failure apart from a crash. Configure a default via findings.fail_on.

Fingerprints are line-independent, so a finding survives edits elsewhere in the file.

Custom & regulatory compliance packs: add your own detection rules via YAML (analysis.compliance.pack_paths) and map findings to regulatory controls with built-in (pci-dss, nist-800-53) or custom frameworks (analysis.compliance.frameworks / framework_paths). Findings carry controls (e.g. PCI-DSS:8.6.2) in every output, and legacylens compliance summarizes by control. See docs/RULES.md.

COBOL parser backend (client choice)

The COBOL parser backend is selectable in config under parser.backend:

regex (default) — pure-Python, zero-dependency line parser. Works everywhere, installs air-gapped with no extra steps.
antlr (opt-in) — grammar-based parser using ANTLR, for higher fidelity (the lexer understands string literals and tokens natively). It requires:
1. the runtime extra: pip install 'legacylens[antlr]', and
2. a one-time parser generation: python scripts/build_antlr.py (needs Java at build time only; see the script header).

parser:
  backend: antlr          # or: regex
  fallback_to_regex: true # if antlr isn't generated/installed, use regex instead of failing

With fallback_to_regex: true (default), selecting antlr before generating it simply logs a warning and uses the regex backend — runs never break. The ANTLR grammar (src/legacylens/parsing/antlr/Cobol.g4) is a starter covering the structural constructs legacylens needs; clients can extend it or substitute a mature grammar (e.g. ProLeap's).

Validation

Beyond the unit suite, legacylens has been run end-to-end against public COBOL, JCL, and PL/I repositories (AWS CardDemo, IBM Bank-of-Z, and others). See docs/VALIDATION.md for the test matrix, results, and the issues that real-world testing surfaced and fixed.

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jul 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

legacylens-0.1.0.tar.gz (97.8 kB view details)

Uploaded Jul 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

legacylens-0.1.0-py3-none-any.whl (92.2 kB view details)

Uploaded Jul 1, 2026 Python 3

File details

Details for the file legacylens-0.1.0.tar.gz.

File metadata

Download URL: legacylens-0.1.0.tar.gz
Upload date: Jul 1, 2026
Size: 97.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for legacylens-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`14de1a4abb96dc977b9c36281980bf56f5d7c6296935d496a81372ef60cdf3e9`
MD5	`d016465b11c5ed49084ab1cd0bb35e7e`
BLAKE2b-256	`336c808a2acc30b47aafe9cedd82c7461d269ee70cfd14a7f58b31e9a1436863`

See more details on using hashes here.

File details

Details for the file legacylens-0.1.0-py3-none-any.whl.

File metadata

Download URL: legacylens-0.1.0-py3-none-any.whl
Upload date: Jul 1, 2026
Size: 92.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for legacylens-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`23c1377fbc194bc8a49e28939eeaf5177f62b959f7b64b18bc27ec17cca2467d`
MD5	`21deb046ea174eda9a3e6170d962d030`
BLAKE2b-256	`97812f9b7a004a93bf8d7e7bbaf93e304dc1573149cf144b208248561be3d9d1`

See more details on using hashes here.

legacylens 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

legacylens

Installation

Recommended — `pipx` (puts `legacylens` on PATH for every shell, all OSes)

Alternative — `pip`

Air-gapped / offline install

Development (editable, with tests)

Quick start

Configuration

Choosing your LLM provider

Development

Findings lifecycle & CI gating

COBOL parser backend (client choice)

Validation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

legacylens 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

legacylens

Installation

Recommended — pipx (puts legacylens on PATH for every shell, all OSes)

Alternative — pip

Air-gapped / offline install

Development (editable, with tests)

Quick start

Configuration

Choosing your LLM provider

Development

Findings lifecycle & CI gating

COBOL parser backend (client choice)

Validation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Recommended — `pipx` (puts `legacylens` on PATH for every shell, all OSes)

Alternative — `pip`