Skip to main content

No project description provided

Project description

GLLM Privacy

Description

A library to protect Personal Identifiable Information (PII) in a Generative AI project.

Installation

Prerequisites

Mandatory:

  1. Python 3.11+ — Install here
  2. pip — Install here
  3. uv — Install here

Extras (required only for Artifact Registry installations):

  1. gcloud CLI (for authentication) — Install here, then log in using:
    gcloud auth login
    

Option 1: Install from Artifact Registry

This option requires authentication via the gcloud CLI.

uv pip install \
  --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" \
  gllm-privacy

Option 2: Install from PyPI

This option requires no authentication. However, it installs the binary wheel version of the package, which is fully usable but does not include source code.

uv pip install gllm-privacy-binary

Local Development Setup

Prerequisites

  1. Python 3.11+ — Install here

  2. pip — Install here

  3. uv — Install here

  4. gcloud CLI — Install here, then log in using:

    gcloud auth login
    
  5. Git — Install here

  6. Access to the GDP Labs SDK GitHub repository


1. Clone Repository

git clone git@github.com:GDP-ADMIN/gl-sdk.git
cd gl-sdk/libs/gllm-privacy

2. Setup Authentication

Set the following environment variables to authenticate with internal package indexes:

export UV_INDEX_GEN_AI_INTERNAL_USERNAME=oauth2accesstoken
export UV_INDEX_GEN_AI_INTERNAL_PASSWORD="$(gcloud auth print-access-token)"
export UV_INDEX_GEN_AI_USERNAME=oauth2accesstoken
export UV_INDEX_GEN_AI_PASSWORD="$(gcloud auth print-access-token)"

3. Quick Setup

Run:

make setup

4. Activate Virtual Environment

source .venv/bin/activate

Local Development Utilities

The following Makefile commands are available for quick operations:

Install uv

make install-uv

Install Pre-Commit

make install-pre-commit

Install Dependencies

make install

Update Dependencies

make update

Run Tests

make test

Usage

from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities
from gllm_privacy.pii_detector.anonymizer import Operation
from asyncio import run

text = """
    contoh nomor ktp 3525011212941001
    repeat nomor ktp 3525011212941001
    contoh email john.doe@example.com
    contoh nomor telepon +628121729819 dan 0812898029384.
    contoh npwp 01.123.456.7-891.234
"""
text_analyzer = TextAnalyzer()
entities = [Entities.EMAIL_ADDRESS, Entities.KTP, Entities.NPWP, Entities.PHONE_NUMBER]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = run(text_anonymizer.run(text=text, entities=entities))
print(anonymized_text)

deanonymized_text = run(text_anonymizer.run(text=text, entities=entities, operation=Operation.DEANONYMIZE))
print(deanonymized_text)

If you need to detect person, organization, or location entities in text written in Bahasa Indonesia, you can use either TransformersRecognizer or ProsaRemoteRecognizer. To use the TransformersRecognizer, you can use it like this:

from gllm_privacy.pii_detector.recognizer.config import CAHYA_BERT_CONFIGURATION
from gllm_privacy.pii_detector.recognizer.transformers_recognizer import TransformersRecognizer
from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities

# Load the model, if you run it for the first time, it will download the model from the Hugging Face model hub
transformers_recognizer = TransformersRecognizer(
  model_path=CAHYA_BERT_CONFIGURATION.get("DEFAULT_MODEL_PATH"),
  supported_entities=CAHYA_BERT_CONFIGURATION.get("PRESIDIO_SUPPORTED_ENTITIES"),
)
transformers_recognizer.load_transformer(**CAHYA_BERT_CONFIGURATION)
analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])

text = "John Doe adalah seorang karyawan PT ABCD yang berlokasi di Jakarta."
text_analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])
entities = [Entities.PERSON, Entities.LOCATION]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = text_anonymizer.anonymize(text=text, entities=entities)
print(anonymized_text)

deanonymized_text = text_anonymizer.deanonymize(text=text)
print(deanonymized_text)

Enhanced TransformersRecognizer with Optimum

The TransformersRecognizer now supports Hugging Face Optimum for improved performance:

  • ONNX Runtime with CUDA: GPU-accelerated inference using ONNX Runtime with CUDA provider
  • ONNX Runtime with CPU: Optimized CPU inference for better performance on laptops/servers
  • Apple Silicon MPS: GPU acceleration on Apple Silicon Macs
  • Auto-detection: Automatically selects the best available backend
  • Fallback compatibility: Works on any hardware with standard transformers

Available Backends:

  • onnx: ONNX Runtime with CPU provider (optimized for NER tasks)
  • cuda: ONNX Runtime with CUDA provider (GPU acceleration)
  • mps: Apple Silicon MPS for GPU acceleration on Mac
  • transformers: Standard transformers as fallback

Configuration Options:

You can configure the backend behavior in your configuration:

config = {
    "USE_OPTIMUM": True,                    # Enable/disable Optimum
    "OPTIMUM_BACKEND": "auto",              # "auto", "onnx", "cuda", "mps", "transformers"
    "OPTIMUM_DEVICE": "auto",               # "auto", "cuda", "cpu", "mps"
    "OPTIMUM_QUANTIZATION": False,          # Enable quantization
    "OPTIMUM_MAX_BATCH_SIZE": 8,           # Max batch size
}

Usage Example:

from gllm_privacy.pii_detector import TextAnalyzer
from gllm_privacy.pii_detector.recognizer.config import CAHYA_BERT_CONFIGURATION
from gllm_privacy.pii_detector.recognizer.transformers_recognizer import TransformersRecognizer

transformers_recognizer = TransformersRecognizer(
    model_path=CAHYA_BERT_CONFIGURATION.get("DEFAULT_MODEL_PATH"),
    supported_entities=CAHYA_BERT_CONFIGURATION.get("PRESIDIO_SUPPORTED_ENTITIES"),
    use_optimum=True
)

transformers_recognizer.load_transformer(**CAHYA_BERT_CONFIGURATION)

pipeline_info = transformers_recognizer.get_pipeline_info()
print(f"Backend: {pipeline_info['backend']}")
print(f"Device: {pipeline_info['device']}")
print(f"Optimizations: {pipeline_info['optimizations']}")

# Use as before
analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])

To use the ProsaRemoteRecognizer, you can use it like the following example. Please replace <PROSA_API_URL> and <PROSA_API_KEY> with the valid values.

from gllm_privacy.pii_detector.recognizer.prosa_remote_recognizer import ProsaRemoteRecognizer
from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities

text = "John Doe adalah seorang karyawan PT ABCD yang berlokasi di Jakarta."
prosa_recognizer = ProsaRemoteRecognizer('<PROSA_API_URL>', '<PROSA_API_KEY>')
text_analyzer = TextAnalyzer(additional_recognizers=[prosa_recognizer])
entities = [Entities.PERSON, Entities.LOCATION]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = text_anonymizer.anonymize(text=text, entities=entities)
print(anonymized_text)

deanonymized_text = text_anonymizer.deanonymize(text=text)
print(deanonymized_text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gllm_privacy_binary-0.4.18-cp313-cp313-win_amd64.whl (581.5 kB view details)

Uploaded CPython 3.13Windows x86-64

gllm_privacy_binary-0.4.18-cp313-cp313-macosx_13_0_arm64.whl (582.9 kB view details)

Uploaded CPython 3.13macOS 13.0+ ARM64

gllm_privacy_binary-0.4.18-cp312-cp312-win_amd64.whl (584.1 kB view details)

Uploaded CPython 3.12Windows x86-64

gllm_privacy_binary-0.4.18-cp312-cp312-manylinux_2_31_x86_64.whl (853.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.31+ x86-64

gllm_privacy_binary-0.4.18-cp312-cp312-macosx_13_0_arm64.whl (580.6 kB view details)

Uploaded CPython 3.12macOS 13.0+ ARM64

gllm_privacy_binary-0.4.18-cp311-cp311-win_amd64.whl (596.8 kB view details)

Uploaded CPython 3.11Windows x86-64

gllm_privacy_binary-0.4.18-cp311-cp311-manylinux_2_31_x86_64.whl (776.9 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.31+ x86-64

gllm_privacy_binary-0.4.18-cp311-cp311-macosx_13_0_arm64.whl (568.7 kB view details)

Uploaded CPython 3.11macOS 13.0+ ARM64

File details

Details for the file gllm_privacy_binary-0.4.18-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.18-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 6430fd5a386a5c72ac7cfcbe9eeba532e353cfe97f68e01d2741f0f4045b5d3b
MD5 fc02ab85809e1065b1a85dbe9214204b
BLAKE2b-256 ca4b5af7fa3a85cdb7203083e5b1d7c6befe37663213581d76a108dcb1b841a7

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.18-cp313-cp313-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.18-cp313-cp313-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.18-cp313-cp313-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 8cd619900d03ac5db2c7cd0a2b865a06ede73daaf449db543eb8ca0df8d5aac9
MD5 595c461babeb4865bbb5e5575e76fd42
BLAKE2b-256 253507246890756b356fa04e302ceb396e02afb3ed6c2ec6bc98e0569272b696

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.18-cp313-cp313-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.18-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.18-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 2a46c2498176727ab3288f831ba32c9057a9d77dcd0c45aa08fabd0c92298035
MD5 02d71ed695e8791eadb011395c1c7aa9
BLAKE2b-256 6da5dffcdbafee8b54556e48082a9ead8375eae82632b39a4e6ad2d9584ff373

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.18-cp312-cp312-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.18-cp312-cp312-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.18-cp312-cp312-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 3321f04f219b95f15f1142f76455eb8cce9e0ab61d65b15ea0807de6415beed3
MD5 47e768d569b8dcc27717e1d939e643fb
BLAKE2b-256 0a3e786e165a4a05e45e93f2d8d954055e62cb4fb94421dcce5189930d558045

See more details on using hashes here.

File details

Details for the file gllm_privacy_binary-0.4.18-cp312-cp312-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.18-cp312-cp312-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 87ea170b0657811413748ce51988b8cd86a668950e5387921f8131b86e29a37f
MD5 97a5c74ded8376f6ec0963b2e1c1272f
BLAKE2b-256 c5c64af6ea28dcfac400a1617cc41e0140b678a9c63060e2c3d5038b88c64141

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.18-cp312-cp312-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.18-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.18-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 55e270b865ce5eafda2da25f36b0517b14beeb7d9e6a840e09162fe9cd4b332e
MD5 af5a626ebdce72f31aa0434e5520096d
BLAKE2b-256 dfc6ff9b7571d81a40fd37baf0d51e7108ebce5b820e58eeaac9b483ae7694da

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.18-cp311-cp311-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.18-cp311-cp311-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.18-cp311-cp311-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 57c3dfc3859b6c95cf75f4d2026487109620bd38764ccc8e0bf836208ed2c8fa
MD5 1adbec3ef99b5ef0fabc72f140b6bd02
BLAKE2b-256 455cacaf537873565ff4e5b5c8004f38f5d0475ca9fe4d2bb0709cc4b747f0e5

See more details on using hashes here.

File details

Details for the file gllm_privacy_binary-0.4.18-cp311-cp311-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.18-cp311-cp311-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 992b7483f5d3e9bf033b917c5b3012d454ee22d19860230b98ec385de25139c1
MD5 800e9b65ac50a0ea99bbf15cd0a24c80
BLAKE2b-256 8471eb41679d4142be4b4e9f6b2e5be1f99ce509c19ff9fef47728585c0fa5bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.18-cp311-cp311-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page