The tool extracts PHI into a structured CSV file and saves the anonymized note as a text file, making it ideal for healthcare professionals, researchers, and developers handling sensitive medical data.
Project description
PHIdelity
Overview
The PHIdelity is a Python tool that intelligently anonymizes Protected Health Information (PHI) in clinical notes while preserving their contextual meaning. Unlike basic redaction methods that obscure data, this tool uses a local Large Language Model (LLM) via Ollama to identify PHI (e.g., names, dates, medical record numbers) and replace it with meaningful, generalized descriptions (e.g., [Patient Name], [Date of Visit]). This contextualized anonymization ensures the anonymized notes remain useful for research, analysis, or sharing while complying with privacy regulations like HIPAA.
The tool extracts PHI into a structured CSV file and saves the anonymized note as a text file, making it ideal for healthcare professionals, researchers, and developers handling sensitive medical data.
Key Features
- Contextualized Anonymization: Replaces PHI with descriptive placeholders that retain the note's meaning (e.g., "John Doe" becomes
[Patient Name]), enhancing usability for downstream applications. - Advanced PHI Detection: Leverages a local LLM (default:
qwen3:4B) to identify a wide range of PHI, including names, dates, medical record numbers, and more. - Structured Output: Saves PHI to a CSV file with unique IDs, types, values, and descriptions for easy tracking and auditing.
- Anonymized Note Export: Generates a text file with the anonymized clinical note, ready for secure sharing or analysis.
- Configurable and Local: Runs on a local Ollama server, ensuring data privacy and allowing customization of the LLM model and output paths.
- Open Source: Licensed under the MIT License, inviting community contributions and adoption.
Why Contextualized Anonymization?
Traditional anonymization methods often replace PHI with generic markers (e.g., [REDACTED]) or random strings, which can obscure the note's meaning and reduce its value for research or clinical review. The Clinical Note Anonymizer addresses this by:
- Preserving Semantics: Descriptive placeholders like
[Attending Physician Name]or[Medical Record Number]maintain the note's context, making it interpretable for humans and machines. - Supporting Use Cases: Anonymized notes remain suitable for medical research, machine learning training, or educational purposes without compromising privacy.
- Ensuring Compliance: By removing identifiable information while retaining structure, the tool helps meet strict privacy standards like HIPAA.
Prerequisites
- Python: Version 3.8 or higher.
- Ollama: A running Ollama server (default:
http://localhost:11434/) with theqwen3:4Bmodel installed. See Ollama's documentation for setup. - Dependencies: Python packages listed in
requirements.txt.
Installation
-
Clone the Repository:
git clone https://github.com/your-username/clinical-note-anonymizer.git cd clinical-note-anonymizer
-
Set Up a Virtual Environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
-
Install and Configure Ollama:
- Install Ollama from ollama.ai.
- Start the Ollama server:
ollama serve - Pull the required model:
ollama pull qwen3:4B
-
Verify Setup: Confirm the Ollama server is running at
http://localhost:11434/:curl http://localhost:11434/api/generate -d '{"model": "qwen3:4B", "prompt": "Test"}'
Usage
-
Prepare a Clinical Note: The script includes a sample clinical note in
anonymizer.py. Modify theclinical_notevariable or provide your own note as a string. -
Run the Script: Process the clinical note to detect PHI and generate outputs:
python anonymizer.py -
Outputs:
- PHI Data (
phi_data.csv): A CSV file with columns:ID,type,value,description. - Anonymized Note (
anonymized_note.txt): A text file with PHI replaced by contextual placeholders. - Console Output: Shows the LLM's JSON output, the anonymized note, and status messages.
- PHI Data (
-
Example Output:
phi_data.csv:ID,type,value,description redacted_name_001,Name,John Doe,Patient Name redacted_date_001,Date,June 11, 2025,Date of Visit redacted_medical_record_number_001,Medical Record Number,123456,Medical Record Number redacted_name_002,Name,Dr. Jane Smith,Attending Physician Name ...anonymized_note.txt:Radiation Oncology Clinical Note Date of Visit: [Date of Visit] Patient Information Name: [Patient Name] Age: 65 years old Medical Record Number: [Medical Record Number] ... Physician: [Attending Physician Name]
-
Customize Configuration: Edit
anonymizer.pyto adjust:OLLAMA_ENDPOINT: Ollama server URL (default:http://localhost:11434/).OLLAMA_MODEL: LLM model (default:qwen3:4B).- Output file paths in
generate_phi_csvandanonymize_clinical_note.
File Structure
clinical-note-anonymizer/
├── anonymizer.py # Main script for PHI detection and anonymization
├── requirements.txt # Python dependencies
├── README.md # Project documentation (this file)
├── phi_data.csv # Output CSV file (generated)
├── anonymized_note.txt # Output anonymized note (generated)
Contributing
We welcome contributions to enhance the Clinical Note Anonymizer, especially improvements to contextualization, LLM integration, or output formats. To contribute:
- Fork the Repository: Create a fork on GitHub.
- Create a Branch: Use a descriptive name (e.g.,
feature/improve-phi-detection). - Make Changes: Implement and test your changes.
- Submit a Pull Request: Include a clear description and reference related issues.
Review the Contributing Guidelines and Code of Conduct before submitting.
Issues and Support
Encounter a problem? Please:
- Check the Issues page for similar reports.
- Open a new issue with details, including error messages and reproduction steps.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
- Powered by Ollama for secure, local LLM inference.
- Motivated by the need for privacy-preserving tools in healthcare that balance compliance and data utility.
Contact
For inquiries or collaboration, use GitHub Issues or contact (add your email if desired).
Last updated: June 11, 2025
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file phidelity-0.1.1.tar.gz.
File metadata
- Download URL: phidelity-0.1.1.tar.gz
- Upload date:
- Size: 19.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6b259a4125b63f7dc4d8fa1153d4423af7f3469595cad2154cfcb2935baad5e
|
|
| MD5 |
e5d68041cdd3f4c14eba6163cc4fe18f
|
|
| BLAKE2b-256 |
5e6f50955b41cf7434d664da61d0e7a7a489b2812708c5d4af4477e880031a4e
|
File details
Details for the file phidelity-0.1.1-py3-none-any.whl.
File metadata
- Download URL: phidelity-0.1.1-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15172af3d369efda2a35b3e61b91e75d603ec99a6210e5f346bd19ccd44fde6d
|
|
| MD5 |
7bd635cd567ae0143b844b24c26e69cb
|
|
| BLAKE2b-256 |
f3dc9fc053148af49ca63bdd3464b56257d61aecccb0e7190d6896a4b45cc8ca
|