Skip to main content

No project description provided

Project description

GLLM Privacy

Description

A library to protect Personal Identifiable Information (PII) in a Generative AI project.

Installation

Prerequisites

Mandatory:

  1. Python 3.11+ — Install here
  2. pip — Install here
  3. uv — Install here

Extras (required only for Artifact Registry installations):

  1. gcloud CLI (for authentication) — Install here, then log in using:
    gcloud auth login
    

Option 1: Install from Artifact Registry

This option requires authentication via the gcloud CLI.

uv pip install \
  --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" \
  gllm-privacy

Option 2: Install from PyPI

This option requires no authentication. However, it installs the binary wheel version of the package, which is fully usable but does not include source code.

uv pip install gllm-privacy-binary

Local Development Setup

Prerequisites

  1. Python 3.11+ — Install here

  2. pip — Install here

  3. uv — Install here

  4. gcloud CLI — Install here, then log in using:

    gcloud auth login
    
  5. Git — Install here

  6. Access to the GDP Labs SDK GitHub repository


1. Clone Repository

git clone git@github.com:GDP-ADMIN/gl-sdk.git
cd gl-sdk/libs/gllm-privacy

2. Setup Authentication

Set the following environment variables to authenticate with internal package indexes:

export UV_INDEX_GEN_AI_INTERNAL_USERNAME=oauth2accesstoken
export UV_INDEX_GEN_AI_INTERNAL_PASSWORD="$(gcloud auth print-access-token)"
export UV_INDEX_GEN_AI_USERNAME=oauth2accesstoken
export UV_INDEX_GEN_AI_PASSWORD="$(gcloud auth print-access-token)"

3. Quick Setup

Run:

make setup

4. Activate Virtual Environment

source .venv/bin/activate

Local Development Utilities

The following Makefile commands are available for quick operations:

Install uv

make install-uv

Install Pre-Commit

make install-pre-commit

Install Dependencies

make install

Update Dependencies

make update

Run Tests

make test

Usage

from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities
from gllm_privacy.pii_detector.anonymizer import Operation
from asyncio import run

text = """
    contoh nomor ktp 3525011212941001
    repeat nomor ktp 3525011212941001
    contoh email john.doe@example.com
    contoh nomor telepon +628121729819 dan 0812898029384.
    contoh npwp 01.123.456.7-891.234
"""
text_analyzer = TextAnalyzer()
entities = [Entities.EMAIL_ADDRESS, Entities.KTP, Entities.NPWP, Entities.PHONE_NUMBER]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = run(text_anonymizer.run(text=text, entities=entities))
print(anonymized_text)

deanonymized_text = run(text_anonymizer.run(text=text, entities=entities, operation=Operation.DEANONYMIZE))
print(deanonymized_text)

If you need to detect person, organization, or location entities in text written in Bahasa Indonesia, you can use either TransformersRecognizer or ProsaRemoteRecognizer. To use the TransformersRecognizer, you can use it like this:

from gllm_privacy.pii_detector.recognizer.config import CAHYA_BERT_CONFIGURATION
from gllm_privacy.pii_detector.recognizer.transformers_recognizer import TransformersRecognizer
from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities

# Load the model, if you run it for the first time, it will download the model from the Hugging Face model hub
transformers_recognizer = TransformersRecognizer(
  model_path=CAHYA_BERT_CONFIGURATION.get("DEFAULT_MODEL_PATH"),
  supported_entities=CAHYA_BERT_CONFIGURATION.get("PRESIDIO_SUPPORTED_ENTITIES"),
)
transformers_recognizer.load_transformer(**CAHYA_BERT_CONFIGURATION)
analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])

text = "John Doe adalah seorang karyawan PT ABCD yang berlokasi di Jakarta."
text_analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])
entities = [Entities.PERSON, Entities.LOCATION]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = text_anonymizer.anonymize(text=text, entities=entities)
print(anonymized_text)

deanonymized_text = text_anonymizer.deanonymize(text=text)
print(deanonymized_text)

Enhanced TransformersRecognizer with Optimum

The TransformersRecognizer now supports Hugging Face Optimum for improved performance:

  • ONNX Runtime with CUDA: GPU-accelerated inference using ONNX Runtime with CUDA provider
  • ONNX Runtime with CPU: Optimized CPU inference for better performance on laptops/servers
  • Apple Silicon MPS: GPU acceleration on Apple Silicon Macs
  • Auto-detection: Automatically selects the best available backend
  • Fallback compatibility: Works on any hardware with standard transformers

Available Backends:

  • onnx: ONNX Runtime with CPU provider (optimized for NER tasks)
  • cuda: ONNX Runtime with CUDA provider (GPU acceleration)
  • mps: Apple Silicon MPS for GPU acceleration on Mac
  • transformers: Standard transformers as fallback

Configuration Options:

You can configure the backend behavior in your configuration:

config = {
    "USE_OPTIMUM": True,                    # Enable/disable Optimum
    "OPTIMUM_BACKEND": "auto",              # "auto", "onnx", "cuda", "mps", "transformers"
    "OPTIMUM_DEVICE": "auto",               # "auto", "cuda", "cpu", "mps"
    "OPTIMUM_QUANTIZATION": False,          # Enable quantization
    "OPTIMUM_MAX_BATCH_SIZE": 8,           # Max batch size
}

Usage Example:

from gllm_privacy.pii_detector import TextAnalyzer
from gllm_privacy.pii_detector.recognizer.config import CAHYA_BERT_CONFIGURATION
from gllm_privacy.pii_detector.recognizer.transformers_recognizer import TransformersRecognizer

transformers_recognizer = TransformersRecognizer(
    model_path=CAHYA_BERT_CONFIGURATION.get("DEFAULT_MODEL_PATH"),
    supported_entities=CAHYA_BERT_CONFIGURATION.get("PRESIDIO_SUPPORTED_ENTITIES"),
    use_optimum=True
)

transformers_recognizer.load_transformer(**CAHYA_BERT_CONFIGURATION)

pipeline_info = transformers_recognizer.get_pipeline_info()
print(f"Backend: {pipeline_info['backend']}")
print(f"Device: {pipeline_info['device']}")
print(f"Optimizations: {pipeline_info['optimizations']}")

# Use as before
analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])

To use the ProsaRemoteRecognizer, you can use it like the following example. Please replace <PROSA_API_URL> and <PROSA_API_KEY> with the valid values.

from gllm_privacy.pii_detector.recognizer.prosa_remote_recognizer import ProsaRemoteRecognizer
from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities

text = "John Doe adalah seorang karyawan PT ABCD yang berlokasi di Jakarta."
prosa_recognizer = ProsaRemoteRecognizer('<PROSA_API_URL>', '<PROSA_API_KEY>')
text_analyzer = TextAnalyzer(additional_recognizers=[prosa_recognizer])
entities = [Entities.PERSON, Entities.LOCATION]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = text_anonymizer.anonymize(text=text, entities=entities)
print(anonymized_text)

deanonymized_text = text_anonymizer.deanonymize(text=text)
print(deanonymized_text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gllm_privacy_binary-0.4.16-cp313-cp313-win_amd64.whl (542.0 kB view details)

Uploaded CPython 3.13Windows x86-64

gllm_privacy_binary-0.4.16-cp313-cp313-macosx_13_0_arm64.whl (539.9 kB view details)

Uploaded CPython 3.13macOS 13.0+ ARM64

gllm_privacy_binary-0.4.16-cp312-cp312-win_amd64.whl (544.3 kB view details)

Uploaded CPython 3.12Windows x86-64

gllm_privacy_binary-0.4.16-cp312-cp312-manylinux_2_31_x86_64.whl (799.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.31+ x86-64

gllm_privacy_binary-0.4.16-cp312-cp312-macosx_13_0_arm64.whl (538.6 kB view details)

Uploaded CPython 3.12macOS 13.0+ ARM64

gllm_privacy_binary-0.4.16-cp311-cp311-win_amd64.whl (558.8 kB view details)

Uploaded CPython 3.11Windows x86-64

gllm_privacy_binary-0.4.16-cp311-cp311-manylinux_2_31_x86_64.whl (728.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.31+ x86-64

gllm_privacy_binary-0.4.16-cp311-cp311-macosx_13_0_arm64.whl (522.6 kB view details)

Uploaded CPython 3.11macOS 13.0+ ARM64

File details

Details for the file gllm_privacy_binary-0.4.16-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.16-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 687d2daa79271df6035e1afecfa69b6fa38c4608aa1cca969b00e3e003ef723a
MD5 fc8ff74bfcc355e50c6fa43cf1b1c25a
BLAKE2b-256 718a0df08e9cd067dd4fec0306a99227ee127b94d1a9724857b3b5dfccc913ed

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.16-cp313-cp313-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.16-cp313-cp313-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.16-cp313-cp313-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 2ab29bb0c7774de6e44608455c930068e1cc3d2c7ed14ff4208a95e03aa0951b
MD5 35170d61964e621d8c2a4807cb2d32e4
BLAKE2b-256 93a5e9a707936daccd86e31ab1004f6b6ab44581f604f71f5d1272d47858b884

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.16-cp313-cp313-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.16-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.16-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 3d4dc2c14bfd01aca92517e15b47ae509e3d93a6b38ed02f2c149284185ed60d
MD5 c2f423a8f5a18b960438ddda536d5780
BLAKE2b-256 5f7752a0bd277121ece7b82be3e477d77b0d6ccb9153e70ef6ac26af7b4e5773

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.16-cp312-cp312-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.16-cp312-cp312-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.16-cp312-cp312-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 1095252207b1b383751082b3796a5f86f6dce1ddbba8c8c55d82b53e5fd6bd87
MD5 8cc5689011d52168dae6b69b932b5e42
BLAKE2b-256 1efddb86b1efbd0da317138e5363210b9f215912720592292053483f1462826f

See more details on using hashes here.

File details

Details for the file gllm_privacy_binary-0.4.16-cp312-cp312-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.16-cp312-cp312-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 617773665f3f0123abf53f3933a102204cb382ac30f486ad50106c542d9cfeb0
MD5 a339552265c10aeecae6a7ad363685e5
BLAKE2b-256 843266d71547877fbac590014806c3e77fca93ced76b0ade3a689ea8f7d89b01

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.16-cp312-cp312-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.16-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.16-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 0cc32f4c6c9914eac0e2714e07f187f4c481dcac1cc9efbc8953a8fd452a7727
MD5 15e80bc5682a299d1b3020771bd9fca5
BLAKE2b-256 44e12f654e0f3324aebcf5f4c81cb233b21b6a266e2a3ebabd0a66545b75b09b

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.16-cp311-cp311-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.16-cp311-cp311-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.16-cp311-cp311-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 6b801fb9dbd65dbddee406b6d656a5623564a272b2023d15478fb105f28dc155
MD5 51dc84e511c66f398c8da0f523cde615
BLAKE2b-256 74abf79f7cae08230abe962bd2aa91745d10e57ddb07418a29a361d0bf957374

See more details on using hashes here.

File details

Details for the file gllm_privacy_binary-0.4.16-cp311-cp311-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.16-cp311-cp311-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 70610826b4bf669227282357f192a7fa2c0a1bbafc93744ecd88ab0232e2d0e6
MD5 adaa2ee0f174a84d9a08e218f423c645
BLAKE2b-256 0948a0c8fd9dd7c2ccebd0c6b77502c3206bd22d31e0a961ebc1dc23ada9acb5

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.16-cp311-cp311-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page