Local-first semantic review assistant that flags likely risky meaning changes in edited text.

These details have not been verified by PyPI

Project links

Project description

SemShift

Catch risky meaning changes Git diff misses.

SemShift is a local-first review assistant for AI-rewritten and human-edited docs, prompts, policies, resumes, and research drafts. It flags likely semantic drift before you merge, publish, or submit text.

Current release line: v0.2.x alpha. The default backend is lexical + heuristic (tfidf). Optional SentenceTransformers embeddings are local semantic embeddings, not a claim of legal, factual, or scientific authority.

5-Second Demo

Before:

We do not share personal data with third parties.

After:

We may share personal data with trusted partners.

SemShift:

CRITICAL: privacy commitment weakened.
Risk flag: third-party sharing.
Recommendation: hold approval until a human reviews the change.

Install

pip install semshift

Optional local embedding backend:

pip install "semshift[models]"

Development:

pip install -e ".[dev]"

Quick Start

semshift compare examples/old_policy.md examples/new_policy.md --mode policy
semshift compare examples/old_policy.md examples/new_policy.md --mode policy --json
semshift compare examples/old_policy.md examples/new_policy.md --mode policy --report semshift-report.md

Use limits for large or generated files:

semshift compare old.md new.md --max-file-size 5242880 --max-chunks 2000

GitHub Action

name: SemShift Check

on:
  pull_request:
    paths:
      - "**/*.md"
      - "**/*.txt"
      - "**/*.yml"

permissions:
  contents: read
  pull-requests: write

jobs:
  semshift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: VeerajSai/SemShift@v0.2.0
        with:
          mode: policy
          fail_on: high
          pr_comment: "true"
          model: tfidf
          report: semshift-report.md

Inputs include files, mode, fail_on, model, report, base_ref, pr_comment, github_token, max_file_size, and max_chunks.

Note: fail_on defaults to high. The action exits with code 1 when any file reaches high or critical drift.

Python API

from semshift import compare_files, compare_text

result = compare_text(
    old="We do not share personal data.",
    new="We may share personal data with partners.",
    mode="policy",
)

print(result.drift_label)
print(result.summary)
print(result.risk_flags)
print(result.to_markdown())

file_result = compare_files("old_policy.md", "new_policy.md", mode="policy")
report = file_result.to_markdown()

Canonical fields include drift_label, overall_score, drift_score, summary, matched_chunks, chunk_matches, claim_changes, tone_shift, risk_flags, warnings, metadata, to_dict(), to_json(), and to_markdown().

Modes

Mode	Maturity	Best for	Main signals
`policy`	stable	privacy policies, terms, consent language	sharing, retention, rights, obligations
`prompt`	stable	system prompts and instruction files	safety rules, hidden instructions, scope
`research`	experimental	research drafts and reports	metrics, datasets, baselines, limitations
`resume`	experimental	resumes and bios	titles, metrics, company/project names
`readme`	experimental	README and support docs	install requirements, guarantees, scope
`default`	stable	general text review	drift score, claims, tone, generic risk

How It Works

SemShift combines transparent signals:

Chunk alignment by headings and text structure.
Lexical TF-IDF similarity by default, or optional local SentenceTransformers embeddings.
Claim extraction, tone signals, and mode-specific risk rules.

TF-IDF is a lexical backend, not a true semantic model. Optional embedding models may download weights on first use; document text is processed locally unless you explicitly integrate external services.

Benchmarks

SemShift includes a starter self-evaluation benchmark for regression tracking. See docs/benchmarks.md.

Do not treat starter benchmark numbers as external validation. Human-labeled external evaluation is still needed.

Compared To

Tool	What it catches	What it misses
Git diff	exact text edits	risk, claims, weakened obligations
diff-match-patch	text similarity	domain-specific meaning changes
LLM judge	broad qualitative review	local determinism, reproducibility, privacy by default
Grammar checker	style and grammar	policy, prompt, research, and factual drift
SemShift	likely risky semantic drift	subtle context, truth verification, legal authority

Limitations

SemShift is:

not legal advice
not a fact-checker
not scientific authority
not a replacement for human review
likely to miss subtle context-dependent changes
likely to false-positive on harmless paraphrases
lexical + heuristic by default

Troubleshooting

semshift: command not found: Confirm the active environment is the one where you installed semshift.

Model import error: Install optional dependencies with pip install "semshift[models]", or use --model tfidf.

Slow first model run: SentenceTransformers may download weights and initialize on first use.

Windows path issues: Quote paths with spaces and prefer PowerShell-compatible quoting.

GitHub Action fork PRs: PR comments can be unavailable for forks with restricted permissions; the report artifact is still written.

No files matched: Pass files, use actions/checkout with fetch-depth: 0, or check supported extensions.

Report too long: GitHub comments are truncated and the full report is uploaded as an artifact.

Roadmap

stronger external benchmark
NLI-based deep mode for contradiction/entailment checks
VS Code extension
web demo
docs site
more file formats

Author

Built by Veeraj Sai.

Citation

Please cite SemShift using CITATION.cff.

License

MIT. See LICENSE.

Security

Report vulnerabilities through GitHub Security Advisories. SemShift is local-first by default, but optional model downloads and external CI integrations should be reviewed in your environment.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

May 21, 2026

0.1.0

May 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semshift-0.2.0.tar.gz (67.2 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

semshift-0.2.0-py3-none-any.whl (42.6 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file semshift-0.2.0.tar.gz.

File metadata

Download URL: semshift-0.2.0.tar.gz
Upload date: May 21, 2026
Size: 67.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for semshift-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`10d240348c5729fbda5e25db93180e1b4d8fee5c273af246a2d34b7efd6544e3`
MD5	`666ffe3dbee275101448f984e2542bfb`
BLAKE2b-256	`5a2386f586624bdb8b93aa1604b528774342b8017afaeea0541ed1fe1484e3e4`

See more details on using hashes here.

File details

Details for the file semshift-0.2.0-py3-none-any.whl.

File metadata

Download URL: semshift-0.2.0-py3-none-any.whl
Upload date: May 21, 2026
Size: 42.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for semshift-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b384011e02575f58c0be645e05de9e149b35be5321d93722c5f4a95fd05ffcd6`
MD5	`7b7357310327ae81145c2b704be30b9f`
BLAKE2b-256	`726a6e6e08d3fab7b8de5e81bbeb82d6cc7d60736cd5203c22c0c37f36f4ea6a`

See more details on using hashes here.

semshift 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SemShift

5-Second Demo

Install

Quick Start

GitHub Action

Python API

Modes

How It Works

Benchmarks

Compared To

Limitations

Troubleshooting

Roadmap

Author

Citation

License

Security

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes