Skip to main content

No project description provided

Project description

GLLM Privacy

Description

A library to protect Personal Identifiable Information (PII) in a Generative AI project.

Installation

Prerequisites

Mandatory:

  1. Python 3.11+ — Install here
  2. pip — Install here
  3. uv — Install here

Extras (required only for Artifact Registry installations):

  1. gcloud CLI (for authentication) — Install here, then log in using:
    gcloud auth login
    

Option 1: Install from Artifact Registry

This option requires authentication via the gcloud CLI.

uv pip install \
  --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" \
  gllm-privacy

Option 2: Install from PyPI

This option requires no authentication. However, it installs the binary wheel version of the package, which is fully usable but does not include source code.

uv pip install gllm-privacy-binary

Local Development Setup

Prerequisites

  1. Python 3.11+ — Install here

  2. pip — Install here

  3. uv — Install here

  4. gcloud CLI — Install here, then log in using:

    gcloud auth login
    
  5. Git — Install here

  6. Access to the GDP Labs SDK GitHub repository


1. Clone Repository

git clone git@github.com:GDP-ADMIN/gl-sdk.git
cd gl-sdk/libs/gllm-privacy

2. Setup Authentication

Set the following environment variables to authenticate with internal package indexes:

export UV_INDEX_GEN_AI_INTERNAL_USERNAME=oauth2accesstoken
export UV_INDEX_GEN_AI_INTERNAL_PASSWORD="$(gcloud auth print-access-token)"
export UV_INDEX_GEN_AI_USERNAME=oauth2accesstoken
export UV_INDEX_GEN_AI_PASSWORD="$(gcloud auth print-access-token)"

3. Quick Setup

Run:

make setup

4. Activate Virtual Environment

source .venv/bin/activate

Local Development Utilities

The following Makefile commands are available for quick operations:

Install uv

make install-uv

Install Pre-Commit

make install-pre-commit

Install Dependencies

make install

Update Dependencies

make update

Run Tests

make test

Usage

from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities
from gllm_privacy.pii_detector.anonymizer import Operation
from asyncio import run

text = """
    contoh nomor ktp 3525011212941001
    repeat nomor ktp 3525011212941001
    contoh email john.doe@example.com
    contoh nomor telepon +628121729819 dan 0812898029384.
    contoh npwp 01.123.456.7-891.234
"""
text_analyzer = TextAnalyzer()
entities = [Entities.EMAIL_ADDRESS, Entities.KTP, Entities.NPWP, Entities.PHONE_NUMBER]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = run(text_anonymizer.run(text=text, entities=entities))
print(anonymized_text)

deanonymized_text = run(text_anonymizer.run(text=text, entities=entities, operation=Operation.DEANONYMIZE))
print(deanonymized_text)

If you need to detect person, organization, or location entities in text written in Bahasa Indonesia, you can use either TransformersRecognizer or ProsaRemoteRecognizer. To use the TransformersRecognizer, you can use it like this:

from gllm_privacy.pii_detector.recognizer.config import CAHYA_BERT_CONFIGURATION
from gllm_privacy.pii_detector.recognizer.transformers_recognizer import TransformersRecognizer
from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities

# Load the model, if you run it for the first time, it will download the model from the Hugging Face model hub
transformers_recognizer = TransformersRecognizer(
  model_path=CAHYA_BERT_CONFIGURATION.get("DEFAULT_MODEL_PATH"),
  supported_entities=CAHYA_BERT_CONFIGURATION.get("PRESIDIO_SUPPORTED_ENTITIES"),
)
transformers_recognizer.load_transformer(**CAHYA_BERT_CONFIGURATION)
analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])

text = "John Doe adalah seorang karyawan PT ABCD yang berlokasi di Jakarta."
text_analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])
entities = [Entities.PERSON, Entities.LOCATION]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = text_anonymizer.anonymize(text=text, entities=entities)
print(anonymized_text)

deanonymized_text = text_anonymizer.deanonymize(text=text)
print(deanonymized_text)

Enhanced TransformersRecognizer with Optimum

The TransformersRecognizer now supports Hugging Face Optimum for improved performance:

  • ONNX Runtime with CUDA: GPU-accelerated inference using ONNX Runtime with CUDA provider
  • ONNX Runtime with CPU: Optimized CPU inference for better performance on laptops/servers
  • Apple Silicon MPS: GPU acceleration on Apple Silicon Macs
  • Auto-detection: Automatically selects the best available backend
  • Fallback compatibility: Works on any hardware with standard transformers

Available Backends:

  • onnx: ONNX Runtime with CPU provider (optimized for NER tasks)
  • cuda: ONNX Runtime with CUDA provider (GPU acceleration)
  • mps: Apple Silicon MPS for GPU acceleration on Mac
  • transformers: Standard transformers as fallback

Configuration Options:

You can configure the backend behavior in your configuration:

config = {
    "USE_OPTIMUM": True,                    # Enable/disable Optimum
    "OPTIMUM_BACKEND": "auto",              # "auto", "onnx", "cuda", "mps", "transformers"
    "OPTIMUM_DEVICE": "auto",               # "auto", "cuda", "cpu", "mps"
    "OPTIMUM_QUANTIZATION": False,          # Enable quantization
    "OPTIMUM_MAX_BATCH_SIZE": 8,           # Max batch size
}

Usage Example:

from gllm_privacy.pii_detector import TextAnalyzer
from gllm_privacy.pii_detector.recognizer.config import CAHYA_BERT_CONFIGURATION
from gllm_privacy.pii_detector.recognizer.transformers_recognizer import TransformersRecognizer

transformers_recognizer = TransformersRecognizer(
    model_path=CAHYA_BERT_CONFIGURATION.get("DEFAULT_MODEL_PATH"),
    supported_entities=CAHYA_BERT_CONFIGURATION.get("PRESIDIO_SUPPORTED_ENTITIES"),
    use_optimum=True
)

transformers_recognizer.load_transformer(**CAHYA_BERT_CONFIGURATION)

pipeline_info = transformers_recognizer.get_pipeline_info()
print(f"Backend: {pipeline_info['backend']}")
print(f"Device: {pipeline_info['device']}")
print(f"Optimizations: {pipeline_info['optimizations']}")

# Use as before
analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])

To use the ProsaRemoteRecognizer, you can use it like the following example. Please replace <PROSA_API_URL> and <PROSA_API_KEY> with the valid values.

from gllm_privacy.pii_detector.recognizer.prosa_remote_recognizer import ProsaRemoteRecognizer
from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities

text = "John Doe adalah seorang karyawan PT ABCD yang berlokasi di Jakarta."
prosa_recognizer = ProsaRemoteRecognizer('<PROSA_API_URL>', '<PROSA_API_KEY>')
text_analyzer = TextAnalyzer(additional_recognizers=[prosa_recognizer])
entities = [Entities.PERSON, Entities.LOCATION]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = text_anonymizer.anonymize(text=text, entities=entities)
print(anonymized_text)

deanonymized_text = text_anonymizer.deanonymize(text=text)
print(deanonymized_text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gllm_privacy_binary-0.4.11-cp313-cp313-win_amd64.whl (551.9 kB view details)

Uploaded CPython 3.13Windows x86-64

gllm_privacy_binary-0.4.11-cp313-cp313-macosx_13_0_arm64.whl (544.5 kB view details)

Uploaded CPython 3.13macOS 13.0+ ARM64

gllm_privacy_binary-0.4.11-cp312-cp312-win_amd64.whl (554.5 kB view details)

Uploaded CPython 3.12Windows x86-64

gllm_privacy_binary-0.4.11-cp312-cp312-manylinux_2_31_x86_64.whl (807.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.31+ x86-64

gllm_privacy_binary-0.4.11-cp312-cp312-macosx_13_0_arm64.whl (544.0 kB view details)

Uploaded CPython 3.12macOS 13.0+ ARM64

gllm_privacy_binary-0.4.11-cp311-cp311-win_amd64.whl (566.8 kB view details)

Uploaded CPython 3.11Windows x86-64

gllm_privacy_binary-0.4.11-cp311-cp311-manylinux_2_31_x86_64.whl (733.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.31+ x86-64

gllm_privacy_binary-0.4.11-cp311-cp311-macosx_13_0_arm64.whl (529.8 kB view details)

Uploaded CPython 3.11macOS 13.0+ ARM64

File details

Details for the file gllm_privacy_binary-0.4.11-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.11-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 d39dc4144955574d52e7598dd3971e74ac6f987c51e0ba8d3018687db68101a3
MD5 7b98dd4d5679483c835d4a69784a3b8c
BLAKE2b-256 d2b24ca72e48bdb9f9caca12328394920c6954586a0a52cae8cb88c80f3ddf87

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.11-cp313-cp313-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.11-cp313-cp313-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.11-cp313-cp313-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 ae07a881da8b26de6abfefa36b935945945cd2b489146c302d70a3bfbb36aa22
MD5 6dfef05de411d2f5be51006428f1e16b
BLAKE2b-256 16d741e33cc75aa1db17f3657aaa6c1500394ba622c099714b090dd67f41663f

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.11-cp313-cp313-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.11-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.11-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 2b4771d2e24822e500220cc6e83fcfeadb71bd1df6b16edf1ea48959ea8702d8
MD5 4397e9ee56a85ac87735b97a2c96960e
BLAKE2b-256 857b9532958108b5c5aa91609a2d2001fb4a12752adaeaafe6aaafd443e25cdd

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.11-cp312-cp312-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.11-cp312-cp312-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.11-cp312-cp312-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 e2faf7b65eaf907685c72daae681b85a025e715503f95ff6e4bc92f003550321
MD5 9f34fafc752c0eeffc2998f6104a2983
BLAKE2b-256 7141627b77aa3af227c64ecf23ba13b6192600996800940f9c7f05e2ff281224

See more details on using hashes here.

File details

Details for the file gllm_privacy_binary-0.4.11-cp312-cp312-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.11-cp312-cp312-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 51cedd19f7935fe23fd145611c20cd7dcfa66eb03166ebcd36fda2a323792789
MD5 f8d7a283be0d272ddeccfa0786274728
BLAKE2b-256 92d40c8cd59c93a9a21b44602eb0e6a8c2e60843a2a644e3a4d772e3a65c93d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.11-cp312-cp312-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.11-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.11-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 db969f7ec227fd890c7086b1ef0282b5b884a5955623baec66f33c9fe3bc057e
MD5 fe4a62107db91a0722316cf3b5eb2f3d
BLAKE2b-256 642ae6cab0772060723bbbc8e95b197ab80f68fe34b88ff87e108b87f7d58e26

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.11-cp311-cp311-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.11-cp311-cp311-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.11-cp311-cp311-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 5d206e8c4255c0de65b4cb78adc8880e5042658a1f6eb9842dc584e1059850dd
MD5 e3d6f4479e04c29e50f1e36384758146
BLAKE2b-256 1efb6a6551972661fdf92a53586a2539c6c602309432556b4e5faff5ed77b6bf

See more details on using hashes here.

File details

Details for the file gllm_privacy_binary-0.4.11-cp311-cp311-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.11-cp311-cp311-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 d49521830cdda140475cef68193f5ee68a6af8bfffd68d55167c67c171cce9b7
MD5 2ea164cd32adaa398b5563e971ab4a21
BLAKE2b-256 db855373cac864fd7d30ea3c38c1368db690837c3601f59a316fffb4e99b5d00

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.11-cp311-cp311-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page