Skip to main content

No project description provided

Project description

GLLM Privacy

Description

A library to protect Personal Identifiable Information (PII) in a Generative AI project.

Installation

Prerequisites

Mandatory:

  1. Python 3.11+ — Install here
  2. pip — Install here
  3. uv — Install here

Extras (required only for Artifact Registry installations):

  1. gcloud CLI (for authentication) — Install here, then log in using:
    gcloud auth login
    

Option 1: Install from Artifact Registry

This option requires authentication via the gcloud CLI.

uv pip install \
  --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" \
  gllm-privacy

Option 2: Install from PyPI

This option requires no authentication. However, it installs the binary wheel version of the package, which is fully usable but does not include source code.

uv pip install gllm-privacy-binary

Local Development Setup

Prerequisites

  1. Python 3.11+ — Install here

  2. pip — Install here

  3. uv — Install here

  4. gcloud CLI — Install here, then log in using:

    gcloud auth login
    
  5. Git — Install here

  6. Access to the GDP Labs SDK GitHub repository


1. Clone Repository

git clone git@github.com:GDP-ADMIN/gl-sdk.git
cd gl-sdk/libs/gllm-privacy

2. Setup Authentication

Set the following environment variables to authenticate with internal package indexes:

export UV_INDEX_GEN_AI_INTERNAL_USERNAME=oauth2accesstoken
export UV_INDEX_GEN_AI_INTERNAL_PASSWORD="$(gcloud auth print-access-token)"
export UV_INDEX_GEN_AI_USERNAME=oauth2accesstoken
export UV_INDEX_GEN_AI_PASSWORD="$(gcloud auth print-access-token)"

3. Quick Setup

Run:

make setup

4. Activate Virtual Environment

source .venv/bin/activate

Local Development Utilities

The following Makefile commands are available for quick operations:

Install uv

make install-uv

Install Pre-Commit

make install-pre-commit

Install Dependencies

make install

Update Dependencies

make update

Run Tests

make test

Usage

from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities
from gllm_privacy.pii_detector.anonymizer import Operation
from asyncio import run

text = """
    contoh nomor ktp 3525011212941001
    repeat nomor ktp 3525011212941001
    contoh email john.doe@example.com
    contoh nomor telepon +628121729819 dan 0812898029384.
    contoh npwp 01.123.456.7-891.234
"""
text_analyzer = TextAnalyzer()
entities = [Entities.EMAIL_ADDRESS, Entities.KTP, Entities.NPWP, Entities.PHONE_NUMBER]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = run(text_anonymizer.run(text=text, entities=entities))
print(anonymized_text)

deanonymized_text = run(text_anonymizer.run(text=text, entities=entities, operation=Operation.DEANONYMIZE))
print(deanonymized_text)

If you need to detect person, organization, or location entities in text written in Bahasa Indonesia, you can use either TransformersRecognizer or ProsaRemoteRecognizer. To use the TransformersRecognizer, you can use it like this:

from gllm_privacy.pii_detector.recognizer.config import CAHYA_BERT_CONFIGURATION
from gllm_privacy.pii_detector.recognizer.transformers_recognizer import TransformersRecognizer
from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities

# Load the model, if you run it for the first time, it will download the model from the Hugging Face model hub
transformers_recognizer = TransformersRecognizer(
  model_path=CAHYA_BERT_CONFIGURATION.get("DEFAULT_MODEL_PATH"),
  supported_entities=CAHYA_BERT_CONFIGURATION.get("PRESIDIO_SUPPORTED_ENTITIES"),
)
transformers_recognizer.load_transformer(**CAHYA_BERT_CONFIGURATION)
analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])

text = "John Doe adalah seorang karyawan PT ABCD yang berlokasi di Jakarta."
text_analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])
entities = [Entities.PERSON, Entities.LOCATION]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = text_anonymizer.anonymize(text=text, entities=entities)
print(anonymized_text)

deanonymized_text = text_anonymizer.deanonymize(text=text)
print(deanonymized_text)

Enhanced TransformersRecognizer with Optimum

The TransformersRecognizer now supports Hugging Face Optimum for improved performance:

  • ONNX Runtime with CUDA: GPU-accelerated inference using ONNX Runtime with CUDA provider
  • ONNX Runtime with CPU: Optimized CPU inference for better performance on laptops/servers
  • Apple Silicon MPS: GPU acceleration on Apple Silicon Macs
  • Auto-detection: Automatically selects the best available backend
  • Fallback compatibility: Works on any hardware with standard transformers

Available Backends:

  • onnx: ONNX Runtime with CPU provider (optimized for NER tasks)
  • cuda: ONNX Runtime with CUDA provider (GPU acceleration)
  • mps: Apple Silicon MPS for GPU acceleration on Mac
  • transformers: Standard transformers as fallback

Configuration Options:

You can configure the backend behavior in your configuration:

config = {
    "USE_OPTIMUM": True,                    # Enable/disable Optimum
    "OPTIMUM_BACKEND": "auto",              # "auto", "onnx", "cuda", "mps", "transformers"
    "OPTIMUM_DEVICE": "auto",               # "auto", "cuda", "cpu", "mps"
    "OPTIMUM_QUANTIZATION": False,          # Enable quantization
    "OPTIMUM_MAX_BATCH_SIZE": 8,           # Max batch size
}

Usage Example:

from gllm_privacy.pii_detector import TextAnalyzer
from gllm_privacy.pii_detector.recognizer.config import CAHYA_BERT_CONFIGURATION
from gllm_privacy.pii_detector.recognizer.transformers_recognizer import TransformersRecognizer

transformers_recognizer = TransformersRecognizer(
    model_path=CAHYA_BERT_CONFIGURATION.get("DEFAULT_MODEL_PATH"),
    supported_entities=CAHYA_BERT_CONFIGURATION.get("PRESIDIO_SUPPORTED_ENTITIES"),
    use_optimum=True
)

transformers_recognizer.load_transformer(**CAHYA_BERT_CONFIGURATION)

pipeline_info = transformers_recognizer.get_pipeline_info()
print(f"Backend: {pipeline_info['backend']}")
print(f"Device: {pipeline_info['device']}")
print(f"Optimizations: {pipeline_info['optimizations']}")

# Use as before
analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])

To use the ProsaRemoteRecognizer, you can use it like the following example. Please replace <PROSA_API_URL> and <PROSA_API_KEY> with the valid values.

from gllm_privacy.pii_detector.recognizer.prosa_remote_recognizer import ProsaRemoteRecognizer
from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities

text = "John Doe adalah seorang karyawan PT ABCD yang berlokasi di Jakarta."
prosa_recognizer = ProsaRemoteRecognizer('<PROSA_API_URL>', '<PROSA_API_KEY>')
text_analyzer = TextAnalyzer(additional_recognizers=[prosa_recognizer])
entities = [Entities.PERSON, Entities.LOCATION]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = text_anonymizer.anonymize(text=text, entities=entities)
print(anonymized_text)

deanonymized_text = text_anonymizer.deanonymize(text=text)
print(deanonymized_text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gllm_privacy_binary-0.4.10b1-cp312-cp312-win_amd64.whl (554.5 kB view details)

Uploaded CPython 3.12Windows x86-64

gllm_privacy_binary-0.4.10b1-cp312-cp312-manylinux_2_31_x86_64.whl (807.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.31+ x86-64

gllm_privacy_binary-0.4.10b1-cp312-cp312-macosx_13_0_arm64.whl (544.0 kB view details)

Uploaded CPython 3.12macOS 13.0+ ARM64

gllm_privacy_binary-0.4.10b1-cp311-cp311-win_amd64.whl (566.8 kB view details)

Uploaded CPython 3.11Windows x86-64

gllm_privacy_binary-0.4.10b1-cp311-cp311-manylinux_2_31_x86_64.whl (733.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.31+ x86-64

gllm_privacy_binary-0.4.10b1-cp311-cp311-macosx_13_0_arm64.whl (529.8 kB view details)

Uploaded CPython 3.11macOS 13.0+ ARM64

File details

Details for the file gllm_privacy_binary-0.4.10b1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.10b1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 28b6d2746e80e1167d675d322a8978509f12224b482ebd46247798ef789aa79b
MD5 a80fdeea61e5aaa4c2cd8e6104e15524
BLAKE2b-256 6149ef99241466191e39ed6b9a496d5d66868db56fd481ca6d0cec9c781388fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.10b1-cp312-cp312-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.10b1-cp312-cp312-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.10b1-cp312-cp312-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 230a805c94f74879e29ad1dcfc72f7da679adf0e4539a1845f4bdf46572664a8
MD5 21e88ed1273d63f902d4ad83be6b12e7
BLAKE2b-256 d74a30876671e2902f0d894155c7330d8226c605847027b55383eadb11843ddc

See more details on using hashes here.

File details

Details for the file gllm_privacy_binary-0.4.10b1-cp312-cp312-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.10b1-cp312-cp312-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 d05377339f4fcc3ffc7e5db1944cdfd032926004f9c0bf3a3c7f66f1ab4275ad
MD5 7dc723a82831d237dc1cf242d088499b
BLAKE2b-256 040327816bd5f556c20c25da73515f3fbe14b439bed37d9f306b8e1bb059cf14

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.10b1-cp312-cp312-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.10b1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.10b1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 91cc2e9191994fc2b764ef636c35d1d8091e75af74073a1a735b4f5ec2d9fc50
MD5 bcb22d275a14ecadede52f11d1fd68a5
BLAKE2b-256 a7566261c26ffef4eea04092520abd30a2053857f7d38676bba530624afb7a79

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.10b1-cp311-cp311-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.10b1-cp311-cp311-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.10b1-cp311-cp311-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 d5ed532cce5bf094482876b467d37c10e17138a9bcd2a5abc778a9b6ad8d743d
MD5 5cc8211e10dda3e136615252bc2dbb11
BLAKE2b-256 66cdd6000e449ab619f0d7d15228998a5221403cc1bebdb5182b4ace31180322

See more details on using hashes here.

File details

Details for the file gllm_privacy_binary-0.4.10b1-cp311-cp311-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.10b1-cp311-cp311-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 95403a092002d6718960a8c720b8df4507585b17a4d01bd1ac84e3d76549191b
MD5 5610000fa6f9cc020447b69564d0a672
BLAKE2b-256 567a42d9b7aa166c4d2e96505f5cbc7ea2cbb4bea54e71a8b164f62c0c50ddcc

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.10b1-cp311-cp311-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page