A collection of funcionality to perform data classification, data privacy risk assessment, and enforce mitigation

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

🔒 READI - Risk Evaluation and De-Identification

Privacy-preserving AI made simple - A comprehensive toolkit for data privacy risk assessment and de-identification in Python-based ML pipelines.

READI augments the functionalities provided by IBM Data Privacy Toolkit, offering state-of-the-art capabilities for detecting Personal and Sensitive Information in unstructured documents. Built for modern compliance frameworks and AI model training workflows.

✨ Features

🎯 Advanced PII Detection - Identify personal and sensitive information across multiple data types
🔄 Seamless Integration - Low-effort integration with existing ML pipelines
📊 Structured & Unstructured Data - Support for both data formats
🌐 REST API - Easy-to-use HTTP interface for remote processing
🧪 Extensible Framework - Modular design for custom privacy requirements
📝 Comprehensive Examples - Jupyter notebooks with real-world use cases

🚀 Quick Start

Prerequisites

Python 3.8 or higher
Git with git-lfs support (for large files >50 MB)
uv (recommended) - A fast Python package installer

Installation

Recommended: Using uv (10-100x faster)

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create and activate virtual environment
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install READI
uv pip install git+https://github.com/IBM/READI.git

Standard Installation with pip:

pip install git+https://github.com/IBM/READI.git

Clone Repository:

git clone https://github.com/IBM/READI.git
cd READI

# With uv (recommended)
uv pip install -e .

# Or with pip
pip install -e .

💻 Development Setup

For contributors and developers:

Recommended: Using uv

# Install in editable mode with development dependencies
uv pip install -e .
uv pip install -r requirements-dev.txt

# Set up pre-commit hooks (recommended)
pre-commit install

Alternative: Using pip

# Install in editable mode with development dependencies
pip install -e .
pip install -r requirements-dev.txt

# Set up pre-commit hooks (recommended)
pre-commit install

This installs the project in editable mode along with development tools (pytest, ruff, bandit, etc.).

💡 Tip: Using uv provides significantly faster dependency resolution and installation compared to traditional pip.

🌐 REST API Usage

READI provides a simple REST API for remote processing.

Setup

# Install with REST API support
pip install -e '.[rest]'

# Start the server
uvicorn risk_assessment.entry_points.rest.api:app

Example Request

curl -H 'Content-Type: application/json' \
     http://localhost:8000/detect_phi \
     --data-raw '{"text":"My text with email: john@gmail.com"}'

The API will be available at http://localhost:8000 with interactive documentation at /docs.

📚 Examples & Tutorials

Explore our comprehensive Jupyter notebooks in the notebooks/ directory:

Notebook	Description
Unstructured Data Classification	General overview of READI API for free-text processing
Structured Data Classification	Working with tabular and structured datasets

📖 Documentation

For detailed documentation, API references, and advanced usage patterns, please visit our documentation portal (coming soon).

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details on:

Code style and standards
Testing requirements
Pull request process
Development workflow

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

📌 How to Cite

If you use READI in academic work, please cite the most relevant publication from the references below. A general citation entry is:

@software{readi_ibm,
  title        = {READI: Risk Evaluation and De-Identification},
  author       = {Stefano Braghin and Liubov Nedoshivina and Anisa Halimi and Naoise Holohan and Kieran Fraser},
  year         = {2026},
  url          = {https://github.com/IBM/READI}
}

When your usage specifically relates to unstructured document de-identification, prefer citing:

@article{nedoshivina2024pragmatic,
  title   = {Pragmatic De-Identification of Cross-Domain Unstructured Documents: A Utility-Preserving Approach with Relation Extraction Filtering},
  author  = {Liubov Nedoshivina and Anisa Halimi and Joa Bettencourt-Silva and Stefano Braghin},
  journal = {AMIA Summits on Translational Science Proceedings},
  volume  = {2024},
  pages   = {85},
  year    = {2024}
}

📚 Academic References

READI is built on years of privacy research. Key publications:

Nedoshivina, L., Halimi, A., Bettencourt-Silva, J., & Braghin, S. (2024). Pragmatic De-Identification of Cross-Domain Unstructured Documents: A Utility-Preserving Approach with Relation Extraction Filtering. AMIA Summits on Translational Science Proceedings, 2024, 85.
Pachilakis, M., Antonatos, S., Levacher, K., & Braghin, S. (2020). PrivLeAD: Privacy Leakage Detection on the Web. Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1250. Springer, Cham. DOI: 10.1007/978-3-030-55180-3_32
Braghin, S., Bettencourt-Silva, J. H., Levacher, K., & Antonatos, S. (2019). An Extensible De-Identification Framework for Privacy Protection of Unstructured Health Information: Creating Sustainable Privacy Infrastructures. MEDINFO 2019: Health and Wellbeing e-Networks for All (pp. 1140-1144). IOS Press. DOI: 10.3233/SHTI190404
Antonatos, S., Braghin, S., Holohan, N., Gkoufas, Y., & Mac Aonghusa, P. (2018). PRIMA: An End-to-End Framework for Privacy at Scale. 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1531-1542. DOI: 10.1109/ICDE.2018.00171
Gkoulalas-Divanis, A., & Braghin, S. (2016). IPV: A system for identifying privacy vulnerabilities in datasets. IBM Journal of Research and Development, vol. 60, no. 4, pp. 14:1-14:10. DOI: 10.1147/JRD.2016.2576818
Gkoulalas-Divanis, A., Braghin, S., & Antonatos, S. (2016). FPVI: A scalable method for discovering privacy vulnerabilities in microdata. 2016 IEEE International Smart Cities Conference (ISC2), pp. 1-8. DOI: 10.1109/ISC2.2016.7580849
Gkoulalas-Divanis, A., & Braghin, S. (2015). Efficient algorithms for identifying privacy vulnerabilities. 2015 IEEE First International Smart Cities Conference (ISC2), pp. 1-8. DOI: 10.1109/ISC2.2015.7366170

🙏 Acknowledgment

This project is partly supported by the Innovative Health Initiative Joint Undertaking (IHI JU) under grant agreement No. 101172997 – SEARCH.

💬 Support & Community

🐛 Issues: GitHub Issues
💡 Discussions: GitHub Discussions
📧 Contact: For enterprise support, please contact the IBM Research team

Built with ❤️ by IBM Research

Documentation • Examples • Contributing • License

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

stefano81

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.5

May 16, 2026

0.1.4

May 15, 2026

0.1.3

May 15, 2026

This version

0.1.2

May 15, 2026

0.1.1

May 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readi_privacy-0.1.2.tar.gz (15.8 MB view details)

Uploaded May 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

readi_privacy-0.1.2-py3-none-any.whl (13.1 MB view details)

Uploaded May 15, 2026 Python 3

File details

Details for the file readi_privacy-0.1.2.tar.gz.

File metadata

Download URL: readi_privacy-0.1.2.tar.gz
Upload date: May 15, 2026
Size: 15.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for readi_privacy-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`11393d86de97932ca3b005543e7d21c456cfda3d22fdbb487c3329dc8067dd91`
MD5	`64c7a14957e76d0fa6974eaec5cf4ab9`
BLAKE2b-256	`24496f971c90c8f559e8021ade73c2f0d95dcb33e8cb9c205406ecde878ed0df`

See more details on using hashes here.

Provenance

The following attestation bundles were made for readi_privacy-0.1.2.tar.gz:

Publisher: publish.yml on IBM/READI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: readi_privacy-0.1.2.tar.gz
- Subject digest: 11393d86de97932ca3b005543e7d21c456cfda3d22fdbb487c3329dc8067dd91
- Sigstore transparency entry: 1548273497
- Sigstore integration time: May 15, 2026
Source repository:
- Permalink: IBM/READI@45bede639c22d42f908df970f5ea689b837fcc18
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/IBM
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@45bede639c22d42f908df970f5ea689b837fcc18
- Trigger Event: push

File details

Details for the file readi_privacy-0.1.2-py3-none-any.whl.

File metadata

Download URL: readi_privacy-0.1.2-py3-none-any.whl
Upload date: May 15, 2026
Size: 13.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for readi_privacy-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`57dee5d1cdc82c6e5fd64a8bf1d403edce0bfa4da0e0e80f3deae283e813c085`
MD5	`1f1c4b115059489bb0d9da6766881598`
BLAKE2b-256	`af3cc7027040ef43275e7c6f36932eeacf58f188eeb1d6b581026c7f85f917c6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for readi_privacy-0.1.2-py3-none-any.whl:

Publisher: publish.yml on IBM/READI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: readi_privacy-0.1.2-py3-none-any.whl
- Subject digest: 57dee5d1cdc82c6e5fd64a8bf1d403edce0bfa4da0e0e80f3deae283e813c085
- Sigstore transparency entry: 1548273682
- Sigstore integration time: May 15, 2026
Source repository:
- Permalink: IBM/READI@45bede639c22d42f908df970f5ea689b837fcc18
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/IBM
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@45bede639c22d42f908df970f5ea689b837fcc18
- Trigger Event: push

readi-privacy 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

🔒 READI - Risk Evaluation and De-Identification

✨ Features

🚀 Quick Start

Prerequisites

Installation

💻 Development Setup

🌐 REST API Usage

Setup

Example Request

📚 Examples & Tutorials

📖 Documentation

🤝 Contributing

📄 License

📌 How to Cite

📚 Academic References

🙏 Acknowledgment

💬 Support & Community

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance