The tool extracts PHI into a structured CSV file and saves the anonymized note as a text file, making it ideal for healthcare professionals, researchers, and developers handling sensitive medical data.
Project description
PHIdelity
Overview
PHIdelity is a Python package designed to intelligently anonymize Protected Health Information (PHI) in clinical notes while preserving their contextual meaning. Unlike traditional redaction tools that simply obscure data, PHIdelity leverages a local Large Language Model (LLM) via Ollama to detect PHI (e.g., names, dates, medical record numbers) and replace it with meaningful, generalized placeholders (e.g., [Patient Name], [Date of Visit]). This approach ensures that anonymized notes remain valuable for research, analysis, or sharing, all while adhering to privacy standards like HIPAA.
PHIdelity offers a flexible command-line interface (CLI) and importable Python functions, extracting PHI into structured CSV files and saving anonymized notes as text files. It’s an ideal tool for healthcare professionals, researchers, and developers working with sensitive medical data.
Key Features
- Contextualized Anonymization: Replaces PHI with descriptive placeholders (e.g., "John Doe" →
[Patient Name]), maintaining the note’s usability. - Advanced PHI Detection: Uses a local LLM (default:
qwen3:4B) to identify diverse PHI types, including names, dates, and medical record numbers. - Structured Output: Exports PHI details to a CSV file with unique IDs, types, values, and descriptions for tracking and auditing.
- Anonymized Note Export: Saves the anonymized note as a text file, ready for secure use.
- Local and Configurable: Operates on a local Ollama server for data privacy, with customizable LLM models and output paths.
- Open Source: Released under the MIT License, encouraging community contributions.
Why Contextualized Anonymization?
Traditional methods often replace PHI with generic markers (e.g., [REDACTED]) or random strings, stripping notes of their meaning and utility. PHIdelity improves on this by:
- Preserving Semantics: Placeholders like
[Attending Physician Name]or[Medical Record Number]keep the note interpretable. - Supporting Use Cases: Enables research, machine learning, and education with privacy intact.
- Ensuring Compliance: Removes identifiable data while retaining structure, aligning with regulations like HIPAA.
Installation
PHIdelity is available on PyPI and can be installed easily with pip.
Steps
-
Install PHIdelity:
pip install phidelity
-
Prerequisites:
- Python: Version 3.8 or higher.
- Ollama: A running Ollama server (default:
http://localhost:11434/) with theqwen3:4Bmodel installed.
-
Set Up Ollama:
- Install Ollama from ollama.ai.
- Start the server:
ollama serve - Pull the default model:
ollama pull qwen3:4B
- Verify it’s running:
curl http://localhost:11434/api/generate -d '{"model": "qwen3:4B", "prompt": "Test"}'
Usage
PHIdelity can be used via its command-line interface (CLI) or as a Python module.
Command-Line Interface (CLI)
The CLI provides a straightforward way to anonymize clinical notes.
Basic Commands
- Anonymize a File:
phidelity --input clinical_note.txt --phi-output phi_data.csv --anonymized-output anonymized_note.txt
- Anonymize from stdin:
echo "Clinical note text" | phidelity
Options
--input: Path to the input clinical note (default: stdin).--phi-output: Path for the PHI CSV output (default: none).--anonymized-output: Path for the anonymized note (default: stdout).--endpoint: Ollama server URL (default:http://localhost:11434/).--model: LLM model (default:qwen3:4B).
Run phidelity --help for full details.
Python Module
Use PHIdelity programmatically in your Python projects.
Example
from phidelity import generate_prompt, query_llm, extract_phi_list, anonymize_note
# Define a clinical note
note = """
Radiation Oncology Clinical Note
Date of Visit: June 11, 2025
Patient Information
Name: John Doe
"""
# Process the note
prompt = generate_prompt(note)
response, error = query_llm(prompt)
if not error:
phi_list, error = extract_phi_list(response)
if not error:
anonymized_note = anonymize_note(note, phi_list)
print(anonymized_note)
else:
print(f"Error: {error}")
Output Example
- Anonymized Note:
Radiation Oncology Clinical Note Date of Visit: [Date of Visit] Patient Information Name: [Patient Name] - PHI CSV (if saved):
ID,type,value,description redacted_name_001,Name,John Doe,Patient Name redacted_date_001,Date,June 11, 2025,Date of Visit
Configuration
- Ollama Settings: Customize the endpoint and model via CLI options (
--endpoint,--model) or function parameters. - Output Paths: Specify paths with
--phi-outputand--anonymized-outputin the CLI; defaults to stdout for the note and no CSV if unspecified.
Contributing
Contributions are welcome! To get started:
- Fork the repository on GitHub.
- Create a branch (e.g.,
feature/better-phi-detection). - Make and test your changes.
- Submit a pull request with a clear description.
See Contributing Guidelines and Code of Conduct for more details.
Issues and Support
If you encounter issues:
- Check the Issues page.
- Submit a new issue with details like error messages and steps to reproduce.
License
PHIdelity is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
- Built with Ollama for local LLM inference.
- Inspired by the need for privacy-preserving healthcare tools that retain data utility.
Contact
For questions or collaboration, use GitHub Issues or reach out at (add your email if desired).
Last updated: June 11, 2025
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file phidelity-0.2.2.tar.gz.
File metadata
- Download URL: phidelity-0.2.2.tar.gz
- Upload date:
- Size: 20.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4bf547330767fac708458f924b80d19098ed97d6f2662c332716ecc8f1f65f76
|
|
| MD5 |
b9ed58963b954262f0327423c62fe349
|
|
| BLAKE2b-256 |
c9f955c46c537fbb916593dee584159bba617896a2735f0a51d783e273029375
|
File details
Details for the file phidelity-0.2.2-py3-none-any.whl.
File metadata
- Download URL: phidelity-0.2.2-py3-none-any.whl
- Upload date:
- Size: 22.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9db9ccdd238d10b27f5604eacf28702c3878f05d5cd6d4f022f3aa33d0fcdd12
|
|
| MD5 |
644d74735787220ae1a6fa1bebd5d8f7
|
|
| BLAKE2b-256 |
a8c603de70618a4732498fa0604d95ed83da5896ea4aac73690cdb2da5ffad50
|