Skip to main content

No project description provided

Project description

GLLM Privacy

Description

A library to protect Personal Identifiable Information (PII) in a Generative AI project.

Installation

Prerequisites

Mandatory:

  1. Python 3.11+ — Install here
  2. pip — Install here
  3. uv — Install here

Extras (required only for Artifact Registry installations):

  1. gcloud CLI (for authentication) — Install here, then log in using:
    gcloud auth login
    

Option 1: Install from Artifact Registry

This option requires authentication via the gcloud CLI.

uv pip install \
  --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" \
  gllm-privacy

Option 2: Install from PyPI

This option requires no authentication. However, it installs the binary wheel version of the package, which is fully usable but does not include source code.

uv pip install gllm-privacy-binary

Local Development Setup

Prerequisites

  1. Python 3.11+ — Install here

  2. pip — Install here

  3. uv — Install here

  4. gcloud CLI — Install here, then log in using:

    gcloud auth login
    
  5. Git — Install here

  6. Access to the GDP Labs SDK GitHub repository


1. Clone Repository

git clone git@github.com:GDP-ADMIN/gl-sdk.git
cd gl-sdk/libs/gllm-privacy

2. Setup Authentication

Set the following environment variables to authenticate with internal package indexes:

export UV_INDEX_GEN_AI_INTERNAL_USERNAME=oauth2accesstoken
export UV_INDEX_GEN_AI_INTERNAL_PASSWORD="$(gcloud auth print-access-token)"
export UV_INDEX_GEN_AI_USERNAME=oauth2accesstoken
export UV_INDEX_GEN_AI_PASSWORD="$(gcloud auth print-access-token)"

3. Quick Setup

Run:

make setup

4. Activate Virtual Environment

source .venv/bin/activate

Local Development Utilities

The following Makefile commands are available for quick operations:

Install uv

make install-uv

Install Pre-Commit

make install-pre-commit

Install Dependencies

make install

Update Dependencies

make update

Run Tests

make test

Usage

from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities
from gllm_privacy.pii_detector.anonymizer import Operation
from asyncio import run

text = """
    contoh nomor ktp 3525011212941001
    repeat nomor ktp 3525011212941001
    contoh email john.doe@example.com
    contoh nomor telepon +628121729819 dan 0812898029384.
    contoh npwp 01.123.456.7-891.234
"""
text_analyzer = TextAnalyzer()
entities = [Entities.EMAIL_ADDRESS, Entities.KTP, Entities.NPWP, Entities.PHONE_NUMBER]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = run(text_anonymizer.run(text=text, entities=entities))
print(anonymized_text)

deanonymized_text = run(text_anonymizer.run(text=text, entities=entities, operation=Operation.DEANONYMIZE))
print(deanonymized_text)

If you need to detect person, organization, or location entities in text written in Bahasa Indonesia, you can use either TransformersRecognizer or ProsaRemoteRecognizer. To use the TransformersRecognizer, you can use it like this:

from gllm_privacy.pii_detector.recognizer.config import CAHYA_BERT_CONFIGURATION
from gllm_privacy.pii_detector.recognizer.transformers_recognizer import TransformersRecognizer
from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities

# Load the model, if you run it for the first time, it will download the model from the Hugging Face model hub
transformers_recognizer = TransformersRecognizer(
  model_path=CAHYA_BERT_CONFIGURATION.get("DEFAULT_MODEL_PATH"),
  supported_entities=CAHYA_BERT_CONFIGURATION.get("PRESIDIO_SUPPORTED_ENTITIES"),
)
transformers_recognizer.load_transformer(**CAHYA_BERT_CONFIGURATION)
analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])

text = "John Doe adalah seorang karyawan PT ABCD yang berlokasi di Jakarta."
text_analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])
entities = [Entities.PERSON, Entities.LOCATION]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = text_anonymizer.anonymize(text=text, entities=entities)
print(anonymized_text)

deanonymized_text = text_anonymizer.deanonymize(text=text)
print(deanonymized_text)

Enhanced TransformersRecognizer with Optimum

The TransformersRecognizer now supports Hugging Face Optimum for improved performance:

  • ONNX Runtime with CUDA: GPU-accelerated inference using ONNX Runtime with CUDA provider
  • ONNX Runtime with CPU: Optimized CPU inference for better performance on laptops/servers
  • Apple Silicon MPS: GPU acceleration on Apple Silicon Macs
  • Auto-detection: Automatically selects the best available backend
  • Fallback compatibility: Works on any hardware with standard transformers

Available Backends:

  • onnx: ONNX Runtime with CPU provider (optimized for NER tasks)
  • cuda: ONNX Runtime with CUDA provider (GPU acceleration)
  • mps: Apple Silicon MPS for GPU acceleration on Mac
  • transformers: Standard transformers as fallback

Configuration Options:

You can configure the backend behavior in your configuration:

config = {
    "USE_OPTIMUM": True,                    # Enable/disable Optimum
    "OPTIMUM_BACKEND": "auto",              # "auto", "onnx", "cuda", "mps", "transformers"
    "OPTIMUM_DEVICE": "auto",               # "auto", "cuda", "cpu", "mps"
    "OPTIMUM_QUANTIZATION": False,          # Enable quantization
    "OPTIMUM_MAX_BATCH_SIZE": 8,           # Max batch size
}

Usage Example:

from gllm_privacy.pii_detector import TextAnalyzer
from gllm_privacy.pii_detector.recognizer.config import CAHYA_BERT_CONFIGURATION
from gllm_privacy.pii_detector.recognizer.transformers_recognizer import TransformersRecognizer

transformers_recognizer = TransformersRecognizer(
    model_path=CAHYA_BERT_CONFIGURATION.get("DEFAULT_MODEL_PATH"),
    supported_entities=CAHYA_BERT_CONFIGURATION.get("PRESIDIO_SUPPORTED_ENTITIES"),
    use_optimum=True
)

transformers_recognizer.load_transformer(**CAHYA_BERT_CONFIGURATION)

pipeline_info = transformers_recognizer.get_pipeline_info()
print(f"Backend: {pipeline_info['backend']}")
print(f"Device: {pipeline_info['device']}")
print(f"Optimizations: {pipeline_info['optimizations']}")

# Use as before
analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])

To use the ProsaRemoteRecognizer, you can use it like the following example. Please replace <PROSA_API_URL> and <PROSA_API_KEY> with the valid values.

from gllm_privacy.pii_detector.recognizer.prosa_remote_recognizer import ProsaRemoteRecognizer
from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities

text = "John Doe adalah seorang karyawan PT ABCD yang berlokasi di Jakarta."
prosa_recognizer = ProsaRemoteRecognizer('<PROSA_API_URL>', '<PROSA_API_KEY>')
text_analyzer = TextAnalyzer(additional_recognizers=[prosa_recognizer])
entities = [Entities.PERSON, Entities.LOCATION]

text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = text_anonymizer.anonymize(text=text, entities=entities)
print(anonymized_text)

deanonymized_text = text_anonymizer.deanonymize(text=text)
print(deanonymized_text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gllm_privacy_binary-0.4.14-cp313-cp313-win_amd64.whl (571.8 kB view details)

Uploaded CPython 3.13Windows x86-64

gllm_privacy_binary-0.4.14-cp313-cp313-macosx_13_0_arm64.whl (572.9 kB view details)

Uploaded CPython 3.13macOS 13.0+ ARM64

gllm_privacy_binary-0.4.14-cp312-cp312-win_amd64.whl (575.3 kB view details)

Uploaded CPython 3.12Windows x86-64

gllm_privacy_binary-0.4.14-cp312-cp312-manylinux_2_31_x86_64.whl (840.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.31+ x86-64

gllm_privacy_binary-0.4.14-cp312-cp312-macosx_13_0_arm64.whl (571.5 kB view details)

Uploaded CPython 3.12macOS 13.0+ ARM64

gllm_privacy_binary-0.4.14-cp311-cp311-win_amd64.whl (590.6 kB view details)

Uploaded CPython 3.11Windows x86-64

gllm_privacy_binary-0.4.14-cp311-cp311-manylinux_2_31_x86_64.whl (764.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.31+ x86-64

gllm_privacy_binary-0.4.14-cp311-cp311-macosx_13_0_arm64.whl (555.2 kB view details)

Uploaded CPython 3.11macOS 13.0+ ARM64

File details

Details for the file gllm_privacy_binary-0.4.14-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.14-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 63b79ab3bc17695a1c41b35b325b66f8a0a1a576d6f094c6cb270d5f492975e2
MD5 e22290a3cf06a2cb78020f5ade3cc586
BLAKE2b-256 fdf70baffec263181165db436f5b564f9e54453d3c194e05783efad0addf0b22

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.14-cp313-cp313-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.14-cp313-cp313-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.14-cp313-cp313-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 84b1044cb5630239103dbb6e43019f5bb50e41e55f1d64d76b843664e15ca76a
MD5 24b778bca4d4785ce604708a70897040
BLAKE2b-256 da09e563918a93ad9d14fa00199f1c3a190e173b47651ff4f1692f9767b3fbc0

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.14-cp313-cp313-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.14-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.14-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 f2dc6884f14570575d4b1fa35ea275d2be28b921d7cfce188fab307cb45bb2d2
MD5 58e941ea0db1a17461d0de36c3ce622a
BLAKE2b-256 cb248393309af3318d57a9902579d156535536b37c2084df88b5d2b9d6e0b125

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.14-cp312-cp312-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.14-cp312-cp312-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.14-cp312-cp312-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 ccaf4409b9d843ffcb319db966b9270e800c68c6c46a6e922d1c586fc1e392cc
MD5 ec18a56dc67891bd07c51d8355c3d02d
BLAKE2b-256 54c61c8a24e95c0cd98d66d125374c9dd11a222e29d196b04e732268689185c3

See more details on using hashes here.

File details

Details for the file gllm_privacy_binary-0.4.14-cp312-cp312-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.14-cp312-cp312-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 18f854ceb35d9bf8a99665efbbf3e9c4128439b188ec9dc2f7c36c624a0ec7e1
MD5 902595bcc1841032820b431a6012609c
BLAKE2b-256 b412a379b5881960a6e1af8041989a0edad2cf8b49943e43b835b4808eb53417

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.14-cp312-cp312-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.14-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.14-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 693179bfc0692adeaca27a18c5e8240fb025e814e52ff0466de934303bd57fc3
MD5 808eb98cb8a10134a60e959e9951a837
BLAKE2b-256 3bb97f7f07b4d165fa95b19fe11a88a5c61a4568b89c32790976c0b6278435e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.14-cp311-cp311-win_amd64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gllm_privacy_binary-0.4.14-cp311-cp311-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.14-cp311-cp311-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 29034f0ed83ac44cd5c9429682704a2ac2daa5306f8d9d430ba0010ebafb51a4
MD5 f79686ce16f565dca0b597b34ef8a119
BLAKE2b-256 55313e0fa92357763f968386e1aaec45f781005171b643968d7e8060d1d68d86

See more details on using hashes here.

File details

Details for the file gllm_privacy_binary-0.4.14-cp311-cp311-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for gllm_privacy_binary-0.4.14-cp311-cp311-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 7d7d462e59d9269e64715623fa091df47ef39964b306ccd3ef172052131d401b
MD5 22c1a6fe4520c466ac5bfdf5c9786c72
BLAKE2b-256 71e3312a7bbe60c2b84f09e160e5d47b12eec4482f7d7b1ea1528d486041e56e

See more details on using hashes here.

Provenance

The following attestation bundles were made for gllm_privacy_binary-0.4.14-cp311-cp311-macosx_13_0_arm64.whl:

Publisher: build-binary.yml on GDP-ADMIN/gl-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page