Skip to main content

No project description provided

Project description

GLLM Privacy

Description

A library to protect Personal Identifiable Information (PII) in a Generative AI project.

Installation

Prerequisites

Mandatory:

  1. Python 3.11+ — Install here
  2. pip — Install here
  3. uv — Install here

Extras (required only for Artifact Registry installations):

  1. gcloud CLI (for authentication) — Install here, then log in using:
    gcloud auth login
    

Option 1: Install from Artifact Registry

This option requires authentication via the gcloud CLI.

uv pip install \
  --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" \
  gllm-privacy

Option 2: Install from PyPI

This option requires no authentication. However, it installs the binary wheel version of the package, which is fully usable but does not include source code.

uv pip install gllm-privacy-binary

Local Development Setup

Prerequisites

  1. Python 3.11+ — Install here

  2. pip — Install here

  3. uv — Install here

  4. gcloud CLI — Install here, then log in using:

    gcloud auth login
    
  5. Git — Install here

  6. Access to the GDP Labs SDK GitHub repository


1. Clone Repository

git clone git@github.com:GDP-ADMIN/gl-sdk.git
cd gl-sdk/libs/gllm-privacy

2. Setup Authentication

Set the following environment variables to authenticate with internal package indexes:

export UV_INDEX_GEN_AI_INTERNAL_USERNAME=oauth2accesstoken
export UV_INDEX_GEN_AI_INTERNAL_PASSWORD="$(gcloud auth print-access-token)"
export UV_INDEX_GEN_AI_USERNAME=oauth2accesstoken
export UV_INDEX_GEN_AI_PASSWORD="$(gcloud auth print-access-token)"

3. Quick Setup

Run:

make setup

4. Activate Virtual Environment

source .venv/bin/activate

Local Development Utilities

The following Makefile commands are available for quick operations:

Install uv

make install-uv

Install Pre-Commit

make install-pre-commit

Install Dependencies

make install

Update Dependencies

make update

Run Tests

make test

Usage

from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities
from gllm_privacy.pii_detector.anonymizer import Operation
from asyncio import run

text = """
    contoh nomor ktp 3525011212941001
    repeat nomor ktp 3525011212941001
    contoh email john.doe@example.com
    contoh nomor telepon +628121729819 dan 0812898029384.
    contoh npwp 01.123.456.7-891.234
"""
text_analyzer = TextAnalyzer()
entities = [Entities.EMAIL_ADDRESS, Entities.KTP, Entities.NPWP, Entities.PHONE_NUMBER]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = run(text_anonymizer.run(text=text, entities=entities))
print(anonymized_text)

deanonymized_text = run(text_anonymizer.run(text=text, entities=entities, operation=Operation.DEANONYMIZE))
print(deanonymized_text)

If you need to detect person, organization, or location entities in text written in Bahasa Indonesia, you can use either TransformersRecognizer or ProsaRemoteRecognizer. To use the TransformersRecognizer, you can use it like this:

from gllm_privacy.pii_detector.recognizer.config import CAHYA_BERT_CONFIGURATION
from gllm_privacy.pii_detector.recognizer.transformers_recognizer import TransformersRecognizer
from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities

# Load the model, if you run it for the first time, it will download the model from the Hugging Face model hub
transformers_recognizer = TransformersRecognizer(
  model_path=CAHYA_BERT_CONFIGURATION.get("DEFAULT_MODEL_PATH"),
  supported_entities=CAHYA_BERT_CONFIGURATION.get("PRESIDIO_SUPPORTED_ENTITIES"),
)
transformers_recognizer.load_transformer(**CAHYA_BERT_CONFIGURATION)
analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])

text = "John Doe adalah seorang karyawan PT ABCD yang berlokasi di Jakarta."
text_analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])
entities = [Entities.PERSON, Entities.LOCATION]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = text_anonymizer.anonymize(text=text, entities=entities)
print(anonymized_text)

deanonymized_text = text_anonymizer.deanonymize(text=text)
print(deanonymized_text)

Enhanced TransformersRecognizer with Optimum

The TransformersRecognizer now supports Hugging Face Optimum for improved performance:

  • ONNX Runtime with CUDA: GPU-accelerated inference using ONNX Runtime with CUDA provider
  • ONNX Runtime with CPU: Optimized CPU inference for better performance on laptops/servers
  • Apple Silicon MPS: GPU acceleration on Apple Silicon Macs
  • Auto-detection: Automatically selects the best available backend
  • Fallback compatibility: Works on any hardware with standard transformers

Available Backends:

  • onnx: ONNX Runtime with CPU provider (optimized for NER tasks)
  • cuda: ONNX Runtime with CUDA provider (GPU acceleration)
  • mps: Apple Silicon MPS for GPU acceleration on Mac
  • transformers: Standard transformers as fallback

Configuration Options:

You can configure the backend behavior in your configuration:

config = {
    "USE_OPTIMUM": True,                    # Enable/disable Optimum
    "OPTIMUM_BACKEND": "auto",              # "auto", "onnx", "cuda", "mps", "transformers"
    "OPTIMUM_DEVICE": "auto",               # "auto", "cuda", "cpu", "mps"
    "OPTIMUM_QUANTIZATION": False,          # Enable quantization
    "OPTIMUM_MAX_BATCH_SIZE": 8,           # Max batch size
}

Usage Example:

from gllm_privacy.pii_detector import TextAnalyzer
from gllm_privacy.pii_detector.recognizer.config import CAHYA_BERT_CONFIGURATION
from gllm_privacy.pii_detector.recognizer.transformers_recognizer import TransformersRecognizer

transformers_recognizer = TransformersRecognizer(
    model_path=CAHYA_BERT_CONFIGURATION.get("DEFAULT_MODEL_PATH"),
    supported_entities=CAHYA_BERT_CONFIGURATION.get("PRESIDIO_SUPPORTED_ENTITIES"),
    use_optimum=True
)

transformers_recognizer.load_transformer(**CAHYA_BERT_CONFIGURATION)

pipeline_info = transformers_recognizer.get_pipeline_info()
print(f"Backend: {pipeline_info['backend']}")
print(f"Device: {pipeline_info['device']}")
print(f"Optimizations: {pipeline_info['optimizations']}")

# Use as before
analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])

To use the ProsaRemoteRecognizer, you can use it like the following example. Please replace <PROSA_API_URL> and <PROSA_API_KEY> with the valid values.

from gllm_privacy.pii_detector.recognizer.prosa_remote_recognizer import ProsaRemoteRecognizer
from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities

text = "John Doe adalah seorang karyawan PT ABCD yang berlokasi di Jakarta."
prosa_recognizer = ProsaRemoteRecognizer('<PROSA_API_URL>', '<PROSA_API_KEY>')
text_analyzer = TextAnalyzer(additional_recognizers=[prosa_recognizer])
entities = [Entities.PERSON, Entities.LOCATION]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = text_anonymizer.anonymize(text=text, entities=entities)
print(anonymized_text)

deanonymized_text = text_anonymizer.deanonymize(text=text)
print(deanonymized_text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gllm_privacy_binary-0.4.21-cp313-cp313-win_amd64.whl (529.3 kB view details)

Uploaded CPython 3.13Windows x86-64

gllm_privacy_binary-0.4.21-cp313-cp313-macosx_13_0_arm64.whl (607.6 kB view details)

Uploaded CPython 3.13macOS 13.0+ ARM64

gllm_privacy_binary-0.4.21-cp312-cp312-win_amd64.whl (531.3 kB view details)

Uploaded CPython 3.12Windows x86-64

gllm_privacy_binary-0.4.21-cp312-cp312-manylinux_2_31_x86_64.whl (873.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.31+ x86-64

gllm_privacy_binary-0.4.21-cp312-cp312-macosx_13_0_arm64.whl (606.3 kB view details)

Uploaded CPython 3.12macOS 13.0+ ARM64

gllm_privacy_binary-0.4.21-cp311-cp311-win_amd64.whl (552.2 kB view details)

Uploaded CPython 3.11Windows x86-64

gllm_privacy_binary-0.4.21-cp311-cp311-manylinux_2_31_x86_64.whl (794.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.31+ x86-64

gllm_privacy_binary-0.4.21-cp311-cp311-macosx_13_0_arm64.whl (595.1 kB view details)

Uploaded CPython 3.11macOS 13.0+ ARM64

File details

Details for the file gllm_privacy_binary-0.4.21-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.21-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 5c6dac4eaa05ddf2c3cec8144a5b1d10982c8ddc45ab7eef2dea1b20ddf0c6fa
MD5 1de0c297006bef8c4e3c427052084965
BLAKE2b-256 be92a4496baf3a136fbaef02b61089f9b3faa71558067adf063cb9bdcd3f55c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.21-cp313-cp313-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.21-cp313-cp313-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.21-cp313-cp313-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 4ec191085918b545ccdefa1bde33483cd4e2e8e69e75d22ed06b442250300164
MD5 36d09b30b3af9e0f9d0803a52a918326
BLAKE2b-256 cb26cabc68a03deca3a0e7874a6e5bc5e2cda590b21e999b1864664122615291

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.21-cp313-cp313-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.21-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.21-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 0491859214328451d1722ebc3fe53bdc7c30e4def55f1dcccd2939fb6f45ce03
MD5 f34433e9991f9755fde5744739f8caac
BLAKE2b-256 c27cfb9e4c94b953b487964d248b418230c71b141b48ed08e2fd85f45b44f7fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.21-cp312-cp312-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.21-cp312-cp312-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.21-cp312-cp312-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 853cf5ac2a5651db85832ce56cc035d1f8ed9cc75157bc146e85aeef914fc08f
MD5 ffe0debdd83e7ff6551019755a1a92f2
BLAKE2b-256 04c159d3ec543cb012e46ddbbceb757fc5c83faee5e6a764d88f798e86bc40ec

See more details on using hashes here.

File details

Details for the file gllm_privacy_binary-0.4.21-cp312-cp312-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.21-cp312-cp312-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 c8a395517485098873e24c813c5a43783e4f3ba5280183a1e49f7ffc2d9a7ba7
MD5 5c0025bd8ed7d8ebd2ae6e9b2288e3db
BLAKE2b-256 a21b588d6436dbb604a2dbe8afe54f9bf4c27ffa371ecc9f45faacaa2883b60e

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.21-cp312-cp312-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.21-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.21-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 f15b5fd56c891ba7fb9f4909565df6d0dbd2197062944d89017da0e4df6ff282
MD5 2ba8c57ab7fa1a3d900b128f30101be4
BLAKE2b-256 b00b089267464013f34f414e1961291fd36892f41fa4ca300540803f173656ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.21-cp311-cp311-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.21-cp311-cp311-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.21-cp311-cp311-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 f07f4e299547d1a3ac36498e585d4a9e9666f564641507049acb3181568f0ee0
MD5 673c2a64d65c1a23c165021ebaa3ddfe
BLAKE2b-256 3251342c58247a6e87d06b76466ed8e4db17f8a9d5c189c5c17b9ab1e8b7e31c

See more details on using hashes here.

File details

Details for the file gllm_privacy_binary-0.4.21-cp311-cp311-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.21-cp311-cp311-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 5795cc6ae30cab55ec469d94dcae332f25fd58cbb5a51a9148e054f99d750d8f
MD5 9c6335fec313f02ed7a5c2f03334c783
BLAKE2b-256 3e7dcd1836ff0249097c7a54ebf7a4cdee77131046d4932980ae73ceb0aba08e

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.21-cp311-cp311-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page