Skip to main content

Presidio Analyzer package

Project description

Presidio analyzer

Description

The Presidio analyzer is a Python based service for detecting PII entities in text.

During analysis, it runs a set of different PII Recognizers, each one in charge of detecting one or more PII entities using different mechanisms.

Presidio analyzer comes with a set of predefined recognizers, but can easily be extended with other types of custom recognizers. Predefined and custom recognizers leverage regex, Named Entity Recognition and other types of logic to detect PII in unstructured text.

Language Model-based PII/PHI Detection

Presidio analyzer supports language model-based PII/PHI detection (LLMs, SLMs) for flexible entity recognition. The current implementation uses LangExtract with support for multiple providers:

  • Ollama - Local model deployment for privacy-sensitive environments
  • Azure OpenAI - Cloud-based deployment with enterprise features
pip install presidio-analyzer[langextract]

Quick Usage

Ollama (local models):

from presidio_analyzer.predefined_recognizers import BasicLangExtractRecognizer
recognizer = BasicLangExtractRecognizer()  # Uses default config

Azure OpenAI (cloud models):

from presidio_analyzer.predefined_recognizers import AzureOpenAILangExtractRecognizer

# Simple usage - pass everything as parameters
recognizer = AzureOpenAILangExtractRecognizer(
    model_id="gpt-4",  # Your Azure deployment name
    azure_endpoint="https://your-resource.openai.azure.com/",
    api_key="your-api-key"
)

# Or use environment variables (AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY):
recognizer = AzureOpenAILangExtractRecognizer(
    model_id="gpt-4"  # Your Azure deployment name
)

# Advanced: Customize entities/prompts with config file
recognizer = AzureOpenAILangExtractRecognizer(
    model_id="gpt-4",
    config_path="./custom_config.yaml",  # Optional: for custom entities/prompts
    azure_endpoint="https://your-resource.openai.azure.com/",
    api_key="your-api-key"
)

Note: LangExtract recognizers do not validate connectivity during initialization. Connection errors or missing models will be reported when analyze() is first called.

See the Language Model-based PII/PHI Detection guide for complete setup and usage instructions.

Deploy Presidio analyzer to Azure

Use the following button to deploy presidio analyzer to your Azure subscription.

Deploy to Azure

Simple usage example

from presidio_analyzer import AnalyzerEngine

# Set up the engine, loads the NLP module (spaCy model by default) and other PII recognizers
analyzer = AnalyzerEngine()

# Call analyzer to get results
results = analyzer.analyze(text="My phone number is 212-555-5555",
                           entities=["PHONE_NUMBER"],
                           language='en')
print(results)

GPU Acceleration

For GPU acceleration, install the appropriate dependencies for your hardware:

  • Linux with NVIDIA GPU: cupy-cuda12x (or the version matching your CUDA installation)
  • macOS with Apple Silicon: MPS (Metal Performance Shaders) is currently not supported. The analyzer will use CPU for PyTorch operations.

Documentation

Additional documentation on installation, usage and extending the Analyzer can be found under the Analyzer section of Presidio Documentation

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

presidio_analyzer-2.2.363.tar.gz (149.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

presidio_analyzer-2.2.363-py3-none-any.whl (259.0 kB view details)

Uploaded Python 3

File details

Details for the file presidio_analyzer-2.2.363.tar.gz.

File metadata

  • Download URL: presidio_analyzer-2.2.363.tar.gz
  • Upload date:
  • Size: 149.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for presidio_analyzer-2.2.363.tar.gz
Algorithm Hash digest
SHA256 f93eae598710e747e54d1a79d3a3b892caea91d46e87a32becb340fb064bf497
MD5 6d169010f1dd14155f3348d09769d4c9
BLAKE2b-256 a8c7c88c469f06cbf44d00a3af773ea8baea7f6ee045ba39933f4a54993854f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for presidio_analyzer-2.2.363.tar.gz:

Publisher: release.yml on data-privacy-stack/presidio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file presidio_analyzer-2.2.363-py3-none-any.whl.

File metadata

File hashes

Hashes for presidio_analyzer-2.2.363-py3-none-any.whl
Algorithm Hash digest
SHA256 84203d1148a87620ab80004cf31775d5868b965395f2754aa0a27c8d308e7dfb
MD5 86929b3c2f5733382f72331506c5c40d
BLAKE2b-256 2d171ff9fbf2cbecb9bf0bf86615aadb4641e29d9712cb6ca51aaeda481f5add

See more details on using hashes here.

Provenance

The following attestation bundles were made for presidio_analyzer-2.2.363-py3-none-any.whl:

Publisher: release.yml on data-privacy-stack/presidio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page