Skip to main content

Presidio Analyzer package

Project description

Presidio analyzer

Description

The Presidio analyzer is a Python based service for detecting PII entities in text.

During analysis, it runs a set of different PII Recognizers, each one in charge of detecting one or more PII entities using different mechanisms.

Presidio analyzer comes with a set of predefined recognizers, but can easily be extended with other types of custom recognizers. Predefined and custom recognizers leverage regex, Named Entity Recognition and other types of logic to detect PII in unstructured text.

Language Model-based PII/PHI Detection

Presidio analyzer supports language model-based PII/PHI detection (LLMs, SLMs) for flexible entity recognition. The current implementation uses LangExtract with support for multiple providers:

  • Ollama - Local model deployment for privacy-sensitive environments
  • Azure OpenAI - Cloud-based deployment with enterprise features
pip install presidio-analyzer[langextract]

Quick Usage

Ollama (local models):

from presidio_analyzer.predefined_recognizers import BasicLangExtractRecognizer
recognizer = BasicLangExtractRecognizer()  # Uses default config

Azure OpenAI (cloud models):

from presidio_analyzer.predefined_recognizers import AzureOpenAILangExtractRecognizer

# Simple usage - pass everything as parameters
recognizer = AzureOpenAILangExtractRecognizer(
    model_id="gpt-4",  # Your Azure deployment name
    azure_endpoint="https://your-resource.openai.azure.com/",
    api_key="your-api-key"
)

# Or use environment variables (AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY):
recognizer = AzureOpenAILangExtractRecognizer(
    model_id="gpt-4"  # Your Azure deployment name
)

# Advanced: Customize entities/prompts with config file
recognizer = AzureOpenAILangExtractRecognizer(
    model_id="gpt-4",
    config_path="./custom_config.yaml",  # Optional: for custom entities/prompts
    azure_endpoint="https://your-resource.openai.azure.com/",
    api_key="your-api-key"
)

Note: LangExtract recognizers do not validate connectivity during initialization. Connection errors or missing models will be reported when analyze() is first called.

See the Language Model-based PII/PHI Detection guide for complete setup and usage instructions.

Deploy Presidio analyzer to Azure

Use the following button to deploy presidio analyzer to your Azure subscription.

Deploy to Azure

Simple usage example

from presidio_analyzer import AnalyzerEngine

# Set up the engine, loads the NLP module (spaCy model by default) and other PII recognizers
analyzer = AnalyzerEngine()

# Call analyzer to get results
results = analyzer.analyze(text="My phone number is 212-555-5555",
                           entities=["PHONE_NUMBER"],
                           language='en')
print(results)

GPU Acceleration

For GPU acceleration, install the appropriate dependencies for your hardware:

  • Linux with NVIDIA GPU: cupy-cuda12x (or the version matching your CUDA installation)
  • macOS with Apple Silicon: MPS (Metal Performance Shaders) is currently not supported. The analyzer will use CPU for PyTorch operations.

Documentation

Additional documentation on installation, usage and extending the Analyzer can be found under the Analyzer section of Presidio Documentation

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

presidio_analyzer-2.2.361-py3-none-any.whl (183.9 kB view details)

Uploaded Python 3

File details

Details for the file presidio_analyzer-2.2.361-py3-none-any.whl.

File metadata

File hashes

Hashes for presidio_analyzer-2.2.361-py3-none-any.whl
Algorithm Hash digest
SHA256 7054b36303f5f47dd4bb3b00600bc936fb46aa3cc5e6befde3de839f0205f7f2
MD5 2744c3e568e02fa3a492a7fd1e98e0d4
BLAKE2b-256 f7475f07857a3ae4bea36cb631adc6899ef58081cb37ad1901aab01b6a8b2849

See more details on using hashes here.

Provenance

The following attestation bundles were made for presidio_analyzer-2.2.361-py3-none-any.whl:

Publisher: release.yml on microsoft/presidio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page