No project description provided
Project description
GLLM Privacy
Description
A library to protect Personal Identifiable Information (PII) in a Generative AI project.
Installation
Prerequisites
Mandatory:
- Python 3.11+ — Install here
- pip — Install here
- uv — Install here
Extras (required only for Artifact Registry installations):
- gcloud CLI (for authentication) — Install here, then log in using:
gcloud auth login
Option 1: Install from Artifact Registry
This option requires authentication via the gcloud CLI.
uv pip install \
--extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" \
gllm-privacy
Option 2: Install from PyPI
This option requires no authentication. However, it installs the binary wheel version of the package, which is fully usable but does not include source code.
uv pip install gllm-privacy-binary
Local Development Setup
Prerequisites
-
Python 3.11+ — Install here
-
pip — Install here
-
uv — Install here
-
gcloud CLI — Install here, then log in using:
gcloud auth login
-
Git — Install here
-
Access to the GDP Labs SDK GitHub repository
1. Clone Repository
git clone git@github.com:GDP-ADMIN/gl-sdk.git
cd gl-sdk/libs/gllm-privacy
2. Setup Authentication
Set the following environment variables to authenticate with internal package indexes:
export UV_INDEX_GEN_AI_INTERNAL_USERNAME=oauth2accesstoken
export UV_INDEX_GEN_AI_INTERNAL_PASSWORD="$(gcloud auth print-access-token)"
export UV_INDEX_GEN_AI_USERNAME=oauth2accesstoken
export UV_INDEX_GEN_AI_PASSWORD="$(gcloud auth print-access-token)"
3. Quick Setup
Run:
make setup
4. Activate Virtual Environment
source .venv/bin/activate
Local Development Utilities
The following Makefile commands are available for quick operations:
Install uv
make install-uv
Install Pre-Commit
make install-pre-commit
Install Dependencies
make install
Update Dependencies
make update
Run Tests
make test
Usage
from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities
from gllm_privacy.pii_detector.anonymizer import Operation
from asyncio import run
text = """
contoh nomor ktp 3525011212941001
repeat nomor ktp 3525011212941001
contoh email john.doe@example.com
contoh nomor telepon +628121729819 dan 0812898029384.
contoh npwp 01.123.456.7-891.234
"""
text_analyzer = TextAnalyzer()
entities = [Entities.EMAIL_ADDRESS, Entities.KTP, Entities.NPWP, Entities.PHONE_NUMBER]
text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = run(text_anonymizer.run(text=text, entities=entities))
print(anonymized_text)
deanonymized_text = run(text_anonymizer.run(text=text, entities=entities, operation=Operation.DEANONYMIZE))
print(deanonymized_text)
If you need to detect person, organization, or location entities in text written in Bahasa Indonesia, you can use either
TransformersRecognizer or ProsaRemoteRecognizer. To use the TransformersRecognizer, you can use it like this:
from gllm_privacy.pii_detector.recognizer.config import CAHYA_BERT_CONFIGURATION
from gllm_privacy.pii_detector.recognizer.transformers_recognizer import TransformersRecognizer
from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities
# Load the model, if you run it for the first time, it will download the model from the Hugging Face model hub
transformers_recognizer = TransformersRecognizer(
model_path=CAHYA_BERT_CONFIGURATION.get("DEFAULT_MODEL_PATH"),
supported_entities=CAHYA_BERT_CONFIGURATION.get("PRESIDIO_SUPPORTED_ENTITIES"),
)
transformers_recognizer.load_transformer(**CAHYA_BERT_CONFIGURATION)
analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])
text = "John Doe adalah seorang karyawan PT ABCD yang berlokasi di Jakarta."
text_analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])
entities = [Entities.PERSON, Entities.LOCATION]
text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = text_anonymizer.anonymize(text=text, entities=entities)
print(anonymized_text)
deanonymized_text = text_anonymizer.deanonymize(text=text)
print(deanonymized_text)
Enhanced TransformersRecognizer with Optimum
The TransformersRecognizer now supports Hugging Face Optimum for improved performance:
- ONNX Runtime with CUDA: GPU-accelerated inference using ONNX Runtime with CUDA provider
- ONNX Runtime with CPU: Optimized CPU inference for better performance on laptops/servers
- Apple Silicon MPS: GPU acceleration on Apple Silicon Macs
- Auto-detection: Automatically selects the best available backend
- Fallback compatibility: Works on any hardware with standard transformers
Available Backends:
onnx: ONNX Runtime with CPU provider (optimized for NER tasks)cuda: ONNX Runtime with CUDA provider (GPU acceleration)mps: Apple Silicon MPS for GPU acceleration on Mactransformers: Standard transformers as fallback
Configuration Options:
You can configure the backend behavior in your configuration:
config = {
"USE_OPTIMUM": True, # Enable/disable Optimum
"OPTIMUM_BACKEND": "auto", # "auto", "onnx", "cuda", "mps", "transformers"
"OPTIMUM_DEVICE": "auto", # "auto", "cuda", "cpu", "mps"
"OPTIMUM_QUANTIZATION": False, # Enable quantization
"OPTIMUM_MAX_BATCH_SIZE": 8, # Max batch size
}
Usage Example:
from gllm_privacy.pii_detector import TextAnalyzer
from gllm_privacy.pii_detector.recognizer.config import CAHYA_BERT_CONFIGURATION
from gllm_privacy.pii_detector.recognizer.transformers_recognizer import TransformersRecognizer
transformers_recognizer = TransformersRecognizer(
model_path=CAHYA_BERT_CONFIGURATION.get("DEFAULT_MODEL_PATH"),
supported_entities=CAHYA_BERT_CONFIGURATION.get("PRESIDIO_SUPPORTED_ENTITIES"),
use_optimum=True
)
transformers_recognizer.load_transformer(**CAHYA_BERT_CONFIGURATION)
pipeline_info = transformers_recognizer.get_pipeline_info()
print(f"Backend: {pipeline_info['backend']}")
print(f"Device: {pipeline_info['device']}")
print(f"Optimizations: {pipeline_info['optimizations']}")
# Use as before
analyzer = TextAnalyzer(additional_recognizers=[transformers_recognizer])
To use the ProsaRemoteRecognizer, you can use it like the following example.
Please replace <PROSA_API_URL> and <PROSA_API_KEY> with the valid values.
from gllm_privacy.pii_detector.recognizer.prosa_remote_recognizer import ProsaRemoteRecognizer
from gllm_privacy.pii_detector import TextAnalyzer, TextAnonymizer
from gllm_privacy.pii_detector.constants import Entities
text = "John Doe adalah seorang karyawan PT ABCD yang berlokasi di Jakarta."
prosa_recognizer = ProsaRemoteRecognizer('<PROSA_API_URL>', '<PROSA_API_KEY>')
text_analyzer = TextAnalyzer(additional_recognizers=[prosa_recognizer])
entities = [Entities.PERSON, Entities.LOCATION]
text_anonymizer = TextAnonymizer(text_analyzer)
anonymized_text = text_anonymizer.anonymize(text=text, entities=entities)
print(anonymized_text)
deanonymized_text = text_anonymizer.deanonymize(text=text)
print(deanonymized_text)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gllm_privacy_binary-0.4.19-cp313-cp313-win_amd64.whl.
File metadata
- Download URL: gllm_privacy_binary-0.4.19-cp313-cp313-win_amd64.whl
- Upload date:
- Size: 522.1 kB
- Tags: CPython 3.13, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
248b47bc3e1a1649208daa9174e2078afc28d15629c622b0370ca5cf2f0ecacd
|
|
| MD5 |
46824f58a1c5b8fd4180e9e85e8db097
|
|
| BLAKE2b-256 |
d364d7601a26ace7e9d4c10f54cf05163ee4b9b5a709d9ce07bda37888ac0eae
|
Provenance
The following attestation bundles were made for gllm_privacy_binary-0.4.19-cp313-cp313-win_amd64.whl:
Publisher:
build-binary.yml on GDP-ADMIN/gl-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gllm_privacy_binary-0.4.19-cp313-cp313-win_amd64.whl -
Subject digest:
248b47bc3e1a1649208daa9174e2078afc28d15629c622b0370ca5cf2f0ecacd - Sigstore transparency entry: 1360899648
- Sigstore integration time:
-
Permalink:
GDP-ADMIN/gl-sdk@3e9ab13558d0bafb07541e0e3ca28e4be1a1041e -
Branch / Tag:
refs/tags/gllm_privacy-v0.4.19 - Owner: https://github.com/GDP-ADMIN
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-binary.yml@3e9ab13558d0bafb07541e0e3ca28e4be1a1041e -
Trigger Event:
push
-
Statement type:
File details
Details for the file gllm_privacy_binary-0.4.19-cp313-cp313-macosx_13_0_arm64.whl.
File metadata
- Download URL: gllm_privacy_binary-0.4.19-cp313-cp313-macosx_13_0_arm64.whl
- Upload date:
- Size: 599.5 kB
- Tags: CPython 3.13, macOS 13.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2dbbe1578ff4b167444b951461bee5aa675e7bfabfe615108793f9415335725
|
|
| MD5 |
e1e0047eb0640fba87ad699370c2546f
|
|
| BLAKE2b-256 |
3d11e5049e64f2b3e5a9f6589c0d2e131f762518d3a35cc7063c18d274af6e89
|
Provenance
The following attestation bundles were made for gllm_privacy_binary-0.4.19-cp313-cp313-macosx_13_0_arm64.whl:
Publisher:
build-binary.yml on GDP-ADMIN/gl-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gllm_privacy_binary-0.4.19-cp313-cp313-macosx_13_0_arm64.whl -
Subject digest:
f2dbbe1578ff4b167444b951461bee5aa675e7bfabfe615108793f9415335725 - Sigstore transparency entry: 1360899835
- Sigstore integration time:
-
Permalink:
GDP-ADMIN/gl-sdk@3e9ab13558d0bafb07541e0e3ca28e4be1a1041e -
Branch / Tag:
refs/tags/gllm_privacy-v0.4.19 - Owner: https://github.com/GDP-ADMIN
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-binary.yml@3e9ab13558d0bafb07541e0e3ca28e4be1a1041e -
Trigger Event:
push
-
Statement type:
File details
Details for the file gllm_privacy_binary-0.4.19-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: gllm_privacy_binary-0.4.19-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 524.5 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c05c7efe8a5daa49e302d8246eee9e4abe17bc86fdde515a15529498ce0ef6c
|
|
| MD5 |
9355422544ff64c0b957686f557c6943
|
|
| BLAKE2b-256 |
e72831236152862452a31dd67e177422eae89101e90695ef24c9e9e9d469298c
|
Provenance
The following attestation bundles were made for gllm_privacy_binary-0.4.19-cp312-cp312-win_amd64.whl:
Publisher:
build-binary.yml on GDP-ADMIN/gl-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gllm_privacy_binary-0.4.19-cp312-cp312-win_amd64.whl -
Subject digest:
5c05c7efe8a5daa49e302d8246eee9e4abe17bc86fdde515a15529498ce0ef6c - Sigstore transparency entry: 1360900076
- Sigstore integration time:
-
Permalink:
GDP-ADMIN/gl-sdk@3e9ab13558d0bafb07541e0e3ca28e4be1a1041e -
Branch / Tag:
refs/tags/gllm_privacy-v0.4.19 - Owner: https://github.com/GDP-ADMIN
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-binary.yml@3e9ab13558d0bafb07541e0e3ca28e4be1a1041e -
Trigger Event:
push
-
Statement type:
File details
Details for the file gllm_privacy_binary-0.4.19-cp312-cp312-manylinux_2_31_x86_64.whl.
File metadata
- Download URL: gllm_privacy_binary-0.4.19-cp312-cp312-manylinux_2_31_x86_64.whl
- Upload date:
- Size: 859.6 kB
- Tags: CPython 3.12, manylinux: glibc 2.31+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.24
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f6cb8adda0e12924230599b9e7f5c932d4fbb7b4a9e9dc68244e1317bb6152d
|
|
| MD5 |
9be4e23665e727f5fa278a8a14bcc633
|
|
| BLAKE2b-256 |
520f99a679f451c84109328408bb616adfb6a18ad5ed60f2b4ae38cdac17b223
|
File details
Details for the file gllm_privacy_binary-0.4.19-cp312-cp312-macosx_13_0_arm64.whl.
File metadata
- Download URL: gllm_privacy_binary-0.4.19-cp312-cp312-macosx_13_0_arm64.whl
- Upload date:
- Size: 599.2 kB
- Tags: CPython 3.12, macOS 13.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b29004cd6bd0d3de67d32cceb43f0fd83cd1d54d17ee4fbf9c03888efd85af2
|
|
| MD5 |
aef80b7dd3bacf3a7ee1f18ef3b16cd6
|
|
| BLAKE2b-256 |
9702a541d149a86a6f5ace2fe839bcbf8bd21f014e474c10896f66a5039c5893
|
Provenance
The following attestation bundles were made for gllm_privacy_binary-0.4.19-cp312-cp312-macosx_13_0_arm64.whl:
Publisher:
build-binary.yml on GDP-ADMIN/gl-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gllm_privacy_binary-0.4.19-cp312-cp312-macosx_13_0_arm64.whl -
Subject digest:
2b29004cd6bd0d3de67d32cceb43f0fd83cd1d54d17ee4fbf9c03888efd85af2 - Sigstore transparency entry: 1360900946
- Sigstore integration time:
-
Permalink:
GDP-ADMIN/gl-sdk@3e9ab13558d0bafb07541e0e3ca28e4be1a1041e -
Branch / Tag:
refs/tags/gllm_privacy-v0.4.19 - Owner: https://github.com/GDP-ADMIN
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-binary.yml@3e9ab13558d0bafb07541e0e3ca28e4be1a1041e -
Trigger Event:
push
-
Statement type:
File details
Details for the file gllm_privacy_binary-0.4.19-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: gllm_privacy_binary-0.4.19-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 544.4 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2cf972d642328c2484a0ca406bbe792ec83840687453735a193b120550d5123
|
|
| MD5 |
73808f60dd66e5a8aadaabdd2a12d94b
|
|
| BLAKE2b-256 |
4f80ed779adbf090175e22c2b2bf7d433ac722eade039fbd6d7813c70676be53
|
Provenance
The following attestation bundles were made for gllm_privacy_binary-0.4.19-cp311-cp311-win_amd64.whl:
Publisher:
build-binary.yml on GDP-ADMIN/gl-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gllm_privacy_binary-0.4.19-cp311-cp311-win_amd64.whl -
Subject digest:
c2cf972d642328c2484a0ca406bbe792ec83840687453735a193b120550d5123 - Sigstore transparency entry: 1360899914
- Sigstore integration time:
-
Permalink:
GDP-ADMIN/gl-sdk@3e9ab13558d0bafb07541e0e3ca28e4be1a1041e -
Branch / Tag:
refs/tags/gllm_privacy-v0.4.19 - Owner: https://github.com/GDP-ADMIN
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-binary.yml@3e9ab13558d0bafb07541e0e3ca28e4be1a1041e -
Trigger Event:
push
-
Statement type:
File details
Details for the file gllm_privacy_binary-0.4.19-cp311-cp311-manylinux_2_31_x86_64.whl.
File metadata
- Download URL: gllm_privacy_binary-0.4.19-cp311-cp311-manylinux_2_31_x86_64.whl
- Upload date:
- Size: 785.0 kB
- Tags: CPython 3.11, manylinux: glibc 2.31+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.24
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3632dec9eeaca1cc8b4f1872de4fbb47357d7df4b0c5eacf6f729a0d42b5e1e2
|
|
| MD5 |
15f565015ef8fc5d1e6a41f643ede447
|
|
| BLAKE2b-256 |
6ccadaf2807626fbc52cdd6cee67328c7d66c9d2e10d11b779a0f58da0304a53
|
File details
Details for the file gllm_privacy_binary-0.4.19-cp311-cp311-macosx_13_0_arm64.whl.
File metadata
- Download URL: gllm_privacy_binary-0.4.19-cp311-cp311-macosx_13_0_arm64.whl
- Upload date:
- Size: 582.2 kB
- Tags: CPython 3.11, macOS 13.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb645d70a401b624e257418c3a58ecc7799ac312781369bf46a9efe6f357f24f
|
|
| MD5 |
e62c4e81bafaac6e2036a54fabf068dd
|
|
| BLAKE2b-256 |
c8fba62ce6a84c561fe71b9009667a0c6e0311a0fef1d6341a27a15f00f2c8ef
|
Provenance
The following attestation bundles were made for gllm_privacy_binary-0.4.19-cp311-cp311-macosx_13_0_arm64.whl:
Publisher:
build-binary.yml on GDP-ADMIN/gl-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gllm_privacy_binary-0.4.19-cp311-cp311-macosx_13_0_arm64.whl -
Subject digest:
eb645d70a401b624e257418c3a58ecc7799ac312781369bf46a9efe6f357f24f - Sigstore transparency entry: 1360899733
- Sigstore integration time:
-
Permalink:
GDP-ADMIN/gl-sdk@3e9ab13558d0bafb07541e0e3ca28e4be1a1041e -
Branch / Tag:
refs/tags/gllm_privacy-v0.4.19 - Owner: https://github.com/GDP-ADMIN
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-binary.yml@3e9ab13558d0bafb07541e0e3ca28e4be1a1041e -
Trigger Event:
push
-
Statement type: