Skip to main content

The tool extracts PHI into a structured CSV file and saves the anonymized note as a text file, making it ideal for healthcare professionals, researchers, and developers handling sensitive medical data.

Project description

PHIdelity

License: MIT
PyPI version

Overview

PHIdelity is a Python package designed to intelligently anonymize Protected Health Information (PHI) in clinical notes while preserving their contextual meaning. Unlike traditional redaction tools that simply obscure data, PHIdelity leverages a local Large Language Model (LLM) via Ollama to detect PHI (e.g., names, dates, medical record numbers) and replace it with meaningful, generalized placeholders (e.g., [Patient Name], [Date of Visit]). This approach ensures that anonymized notes remain valuable for research, analysis, or sharing, all while adhering to privacy standards like HIPAA.

PHIdelity offers a flexible command-line interface (CLI) and importable Python functions, extracting PHI into structured CSV files and saving anonymized notes as text files. It’s an ideal tool for healthcare professionals, researchers, and developers working with sensitive medical data.

Key Features

  • Contextualized Anonymization: Replaces PHI with descriptive placeholders (e.g., "John Doe" → [Patient Name]), maintaining the note’s usability.
  • Advanced PHI Detection: Uses a local LLM (default: qwen3:4B) to identify diverse PHI types, including names, dates, and medical record numbers.
  • Structured Output: Exports PHI details to a CSV file with unique IDs, types, values, and descriptions for tracking and auditing.
  • Anonymized Note Export: Saves the anonymized note as a text file, ready for secure use.
  • Local and Configurable: Operates on a local Ollama server for data privacy, with customizable LLM models and output paths.
  • Open Source: Released under the MIT License, encouraging community contributions.

Why Contextualized Anonymization?

Traditional methods often replace PHI with generic markers (e.g., [REDACTED]) or random strings, stripping notes of their meaning and utility. PHIdelity improves on this by:

  • Preserving Semantics: Placeholders like [Attending Physician Name] or [Medical Record Number] keep the note interpretable.
  • Supporting Use Cases: Enables research, machine learning, and education with privacy intact.
  • Ensuring Compliance: Removes identifiable data while retaining structure, aligning with regulations like HIPAA.

Installation

PHIdelity is available on PyPI and can be installed easily with pip.

Steps

  1. Install PHIdelity:

    pip install phidelity
    
  2. Prerequisites:

    • Python: Version 3.8 or higher.
    • Ollama: A running Ollama server (default: http://localhost:11434/) with the qwen3:4B model installed.
  3. Set Up Ollama:

    • Install Ollama from ollama.ai.
    • Start the server:
      ollama serve
      
    • Pull the default model:
      ollama pull qwen3:4B
      
    • Verify it’s running:
      curl http://localhost:11434/api/generate -d '{"model": "qwen3:4B", "prompt": "Test"}'
      

Usage

PHIdelity can be used via its command-line interface (CLI) or as a Python module.

Command-Line Interface (CLI)

The CLI provides a straightforward way to anonymize clinical notes.

Basic Commands

  • Anonymize a File:
    phidelity --input clinical_note.txt --phi-output phi_data.csv --anonymized-output anonymized_note.txt
    
  • Anonymize from stdin:
    echo "Clinical note text" | phidelity
    

Options

  • --input: Path to the input clinical note (default: stdin).
  • --phi-output: Path for the PHI CSV output (default: none).
  • --anonymized-output: Path for the anonymized note (default: stdout).
  • --endpoint: Ollama server URL (default: http://localhost:11434/).
  • --model: LLM model (default: qwen3:4B).

Run phidelity --help for full details.

Python Module

Use PHIdelity programmatically in your Python projects.

Example

from phidelity import generate_prompt, query_llm, extract_phi_list, anonymize_note

# Define a clinical note
note = """
Radiation Oncology Clinical Note
Date of Visit: June 11, 2025
Patient Information
Name: John Doe
"""

# Process the note
prompt = generate_prompt(note)
response, error = query_llm(prompt)
if not error:
    phi_list, error = extract_phi_list(response)
    if not error:
        anonymized_note = anonymize_note(note, phi_list)
        print(anonymized_note)
else:
    print(f"Error: {error}")

Output Example

  • Anonymized Note:
    Radiation Oncology Clinical Note
    Date of Visit: [Date of Visit]
    Patient Information
    Name: [Patient Name]
    
  • PHI CSV (if saved):
    ID,type,value,description
    redacted_name_001,Name,John Doe,Patient Name
    redacted_date_001,Date,June 11, 2025,Date of Visit
    

Configuration

  • Ollama Settings: Customize the endpoint and model via CLI options (--endpoint, --model) or function parameters.
  • Output Paths: Specify paths with --phi-output and --anonymized-output in the CLI; defaults to stdout for the note and no CSV if unspecified.

Contributing

Contributions are welcome! To get started:

  1. Fork the repository on GitHub.
  2. Create a branch (e.g., feature/better-phi-detection).
  3. Make and test your changes.
  4. Submit a pull request with a clear description.

See Contributing Guidelines and Code of Conduct for more details.

Issues and Support

If you encounter issues:

  • Check the Issues page.
  • Submit a new issue with details like error messages and steps to reproduce.

License

PHIdelity is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

  • Built with Ollama for local LLM inference.
  • Inspired by the need for privacy-preserving healthcare tools that retain data utility.

Contact

For questions or collaboration, use GitHub Issues or reach out at (add your email if desired).


Last updated: June 11, 2025

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phidelity-0.2.2.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phidelity-0.2.2-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file phidelity-0.2.2.tar.gz.

File metadata

  • Download URL: phidelity-0.2.2.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for phidelity-0.2.2.tar.gz
Algorithm Hash digest
SHA256 4bf547330767fac708458f924b80d19098ed97d6f2662c332716ecc8f1f65f76
MD5 b9ed58963b954262f0327423c62fe349
BLAKE2b-256 c9f955c46c537fbb916593dee584159bba617896a2735f0a51d783e273029375

See more details on using hashes here.

File details

Details for the file phidelity-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: phidelity-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 22.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for phidelity-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9db9ccdd238d10b27f5604eacf28702c3878f05d5cd6d4f022f3aa33d0fcdd12
MD5 644d74735787220ae1a6fa1bebd5d8f7
BLAKE2b-256 a8c603de70618a4732498fa0604d95ed83da5896ea4aac73690cdb2da5ffad50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page