Skip to main content

Local web review interface for nophi / nophi-av PHI redaction

Project description

nophi-ui

A local web review interface for the nophi (documents) and nophi-av (audio/video) PII/PHI redaction engines.

It lets you run detection locally from the browser, remove false-positive detections (and, for audio, add a missed segment by hand) before redaction is applied, and view the redacted result.

Run

nophi-ui            # opens http://127.0.0.1:8000
nophi-ui --port 9000 --no-open

You select a server-side input directory and output directory (raw paths), preview the files that will be processed, then start detection.

Prerequisites

  • Python 3.10+ — 3.12 recommended. 3.12 is what the app is tested against; very new releases (e.g. 3.14) may not work.

  • pip install nophi-ui pulls in everything else automatically: the document and audio/video engines (nophi, nophi-av), FastAPI, and the ML stack. No separate installs are needed and no API keys are required.

  • Models are downloaded on first use and are cached

    To fetch them ahead of time instead of on the first run:

    nophi download-models        # document NLP models
    nophi-av download-models     # audio/video models
    

Usage

The interface is a single page with two phases.

1. Setup. Type or Browse… to an input and output folder, pick the options below, then Preview files to confirm what will be processed and Start detection to run:

  • Audio redaction — how PHI is removed from audio: beep (overlay a tone) or silence (mute the span).

  • Whisper model — the speech-to-text model used to transcribe audio/video:

    • tiny — fastest, least accurate
    • base — middle ground
    • small (default) — most accurate of the three, slowest

    Documents don't use Whisper — their detection runs automatically with spaCy + biomedical NER, nothing to choose.

2. Review & apply. When detection finishes, open each file to see its detections, remove any false positives (and, for audio, add missed segments), then Apply to write the redacted result. Apply is re-runnable — toggle detections and re-apply until you're satisfied; nothing is final until you stop the server. See What it does below for the per-format specifics. Redacted files and an Excel report are written to your output folder.

What it does

  • Documents (.txt .csv .docx .xlsx .pdf): detect → review the detection list → uncheck false positives → apply. PDF previews inline; docx/xlsx are download-only.
  • Audio: detect → review (play the original clip per detection) → uncheck false positives and/or add missed start/end segments → apply (re-scrubs from the original; no re-transcription).
  • Video: view-only. Redacted in one shot; detections shown for reference. Video redaction is currently still in development.

PDF redaction-box labels

In redacted PDFs, each box is stamped with a short code instead of the full entity name (full names like <ORGANIZATION> don't fit short spans such as "LLC"), so every entity type is reduced to a 2-letter code rendered as <XX>:

Code Entity type Code Entity type
PR PERSON SS US_SSN
OR ORGANIZATION BK US_BANK_NUMBER
LO LOCATION DL US_DRIVER_LICENSE
DT DATE_TIME PP US_PASSPORT
PH PHONE_NUMBER IT US_ITIN
EM EMAIL_ADDRESS ML MEDICAL_LICENSE
CC CREDIT_CARD IB IBAN_CODE
IP IP_ADDRESS NR NRP
UR URL

The review table in the output report always shows the full entity type; the abbreviations appear only inside the PDF boxes.

Security

This tool serves PII/PHI, so by design it:

  • binds locally to 127.0.0.1 only (refuses other hosts),
  • locks CORS to its own origin,
  • requires a per-launch token on every API call,
  • serves files by opaque job/file id (never a client-supplied path),
  • marks PHI responses Cache-Control: no-store and serves only the clipped segment for audio review.

State is held in memory for the process lifetime; closing the server clears it (a restart means re-running detection).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nophi_ui-0.1.1.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nophi_ui-0.1.1-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file nophi_ui-0.1.1.tar.gz.

File metadata

  • Download URL: nophi_ui-0.1.1.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for nophi_ui-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e86280ee159ff2360b1cd0ec228be6aee95ec9cf83db79ae1c5cc9d934b065e7
MD5 fefc921a7ba90b95d7d5d56665293ac3
BLAKE2b-256 2e216098625c02078c94591eb1ffe5ea0f200d7c2f0c663cc12bfb3c0b142ef4

See more details on using hashes here.

File details

Details for the file nophi_ui-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: nophi_ui-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 25.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for nophi_ui-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9716e2fa07c521d0135401829a11a24aeade4a49ba6b0cdaa521ee65f2a75659
MD5 3a6de201a3310c91c287b9e80efae5dd
BLAKE2b-256 1a1fd0162f5c6abf9935a79eb0c44f9cffa7d246e9259be6f50b5534a42ae60c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page