Skip to main content

Pamola Core library for data anonymization, privacy models, metrics, and utilities

Project description

PAMOLA.CORE

PAMOLA Logo

License Python Status


Privacy Engineering for Python. Finally.

PAMOLA.CORE is the open-source foundation of the PAMOLA platform - a Python library for privacy-preserving data operations with pipeline workflow, reproducibility, and audit trail.

Developed by Realm Inveo Inc.


The Problem

You need to anonymize sensitive data. You've tried:

  • ARX: Powerful, but Java, GUI-focused, opaque operations
  • Faker + Presidio + custom scripts: Fragmented, no pipeline, no proof
  • DP libraries: Great math, but narrow scope

You're still missing:

  • Direct operations (mask, drop, fake - not just "achieve k-anonymity")
  • Risk testing (can someone actually re-identify records?)
  • Short text handling (job titles, comments - without LLM)
  • Reproducibility (what exactly was done?)

The Solution

PAMOLA.CORE: operations-first privacy engineering with full audit trail.

from pamola_core.tasks import TaskRunner
from pamola_core.profiling import ProfileOperation
from pamola_core.anonymization import MaskingOperation, GeneralizationOperation
from pamola_core.noise import LaplaceNoiseOperation
from pamola_core.metrics import PrivacyProxyMetricsOperation
from pamola_core.attacks import AttackSuiteOperation

# Define pipeline with reproducible seed
task = TaskRunner(task_dir="./anonymize_customers", seed=42)

task.run([
    ProfileOperation(params={"analyzers": ["all"]}),
    MaskingOperation(params={"fields": ["name", "email", "phone"], "strategy": "partial"}),
    GeneralizationOperation(params={"field": "age", "bins": [0, 18, 35, 50, 65, 100]}),
    LaplaceNoiseOperation(params={"fields": ["salary"], "epsilon": 1.0, "sensitivity": 1000}),
    PrivacyProxyMetricsOperation(params={"metrics": ["k_anonymity", "l_diversity"]}),
    AttackSuiteOperation(params={"policy": "standard"}),
], input_data="customers.csv")

# Result: task_dir/ with data, metrics, attack results, manifest.json

Output structure:

anonymize_customers/
├── manifest.json          # Full reproducibility record
├── output/                # Anonymized data (csv/parquet)
│   └── anonymized.csv
├── metrics/               # Privacy & utility metrics (JSON)
│   ├── metrics_summary.json
│   └── metrics_detail.json
├── attacks/               # Attack simulation results
│   └── suite_report.json
├── plots/                 # Generated visualizations
├── dictionaries/          # Extracted mappings
└── logs/                  # Execution logs

PAMOLA Ecosystem

PAMOLA.CORE is part of a comprehensive privacy engineering stack:

Component Description Availability
PAMOLA.CORE Operations, metrics, attacks, pipeline runtime Open Source
PAMOLA.STUDIO Visual environment for data transformation and privacy management Commercial
PAMOLA.SYNT Synthetic data generation with DP guarantees (CTGAN, TVAE) Commercial
PAMOLA.TEXT Long text and document anonymization (NLP/LLM-based) Commercial
PAMOLA.INSIGHT Agent modules for LLM integration Commercial

What's In CORE

Category Operations
Profiling ProfileOperation, CorrelationOperation, ShortTextProfileOperation
Anonymization MaskingOperation, GeneralizationOperation, SuppressionOperation, PseudonymizationOperation
Transformation CleaningOperation, MergeOperation, SplitOperation, AggregateOperation
Noise (DP-semantics) LaplaceNoiseOperation, GaussianNoiseOperation, DateTimeJitterOperation, RandomizedResponseOperation
Short Text ShortTextProfileOperation, ShortTextCategorizerOperation, ShortTextMaskOperation, ShortTextNEROperation
Fake Data FakeNameOperation, FakeEmailOperation, FakePhoneOperation, FakeOrgOperation
Metrics QualityMetricsOperation, PrivacyProxyMetricsOperation, AttackBasedMetricsOperation, CompositeScoreOperation
Attacks CVPLAttackOperation, LinkageAttackOperation, SinglingOutOperation, AttributeInferenceOperation, AttackSuiteOperation

Attack Simulation (experimental)

⚠️ Experimental: The attacks module is functional but under active development. API may change in future releases.

PAMOLA.CORE tests practical re-identification risk on your data:

Attack Question
CVPL How much information leaks between releases? (PAMOLA signature)
Fellegi-Sunter Linkage Can records be matched to external data?
Singling-out Are any records uniquely identifiable?
Attribute inference Can sensitive attributes be guessed from QI?
from pamola_core.tasks import TaskRunner
from pamola_core.attacks import AttackSuiteOperation

task = TaskRunner(task_dir="./risk_assessment", seed=42)
task.run([
    AttackSuiteOperation(params={
        "policy": "standard",  # or "minimal", "comprehensive"
        "quasi_identifiers": ["age", "gender", "zipcode"],
        "sensitive_columns": ["diagnosis"]
    })
], input_data="anonymized.csv")

# Result: attacks/suite_report.json with risk scores and verdicts

Note: Model-centric attacks (MIA on generators) belong to PAMOLA.SYNT


Metrics with Verdicts

Metrics produce actionable signals, not just numbers:

from pamola_core.metrics import CompositeScoreOperation

# Aggregate metrics with weighted scoring
CompositeScoreOperation(params={
    "weights": {
        "quality": 0.3,
        "privacy_proxy": 0.2,
        "privacy_attack": 0.4,  # Attack-based metrics weighted higher
        "utility": 0.1
    }
})
# Output: metrics_summary.json with verdict (PASS/WARN/FAIL)

Output example (metrics_summary.json):

{
  "overall": {
    "quality_score": 0.85,
    "privacy_score": 0.78,
    "composite_score": 0.84,
    "verdict": "PASS"
  },
  "metrics": {
    "k_anonymity": {"value": 5, "verdict": "PASS"},
    "linkage_rate": {"value": 0.02, "verdict": "PASS"}
  }
}

DP-Semantics Noise

Add calibrated noise with differential privacy semantics:

from pamola_core.noise import LaplaceNoiseOperation

LaplaceNoiseOperation(params={
    "fields": ["salary", "age"],
    "epsilon": 1.0,
    "sensitivity": {"salary": 1000, "age": 1},
    "seed": 42,
    "clip_bounds": {"salary": [0, None]}  # No negative values
})
# Output includes noise_report.json documenting exactly what was applied

Note: This provides DP-like noise but NOT formal DP guarantees without external accountant. For formal guarantees, use pamola-core[dp] with OpenDP adapter.


What's NOT in CORE

Feature Package Why separate
Long text + LLM anonymization pamola-core[text] Heavy deps (torch, transformers)
Formal DP with accountant pamola-core[dp] Use OpenDP/diffprivlib adapters
Synthetic data generation pamola-synt Different concern, ML models
Model-centric attacks (MIA) pamola-synt Requires trained model access

Installation

From source (current):

git clone https://github.com/DGT-Network/PAMOLA.git
cd PAMOLA
pip install -e .

With optional extras:

pip install -e ".[fast]"       # + Polars, ConnectorX, DuckDB
pip install -e ".[profiling]"  # + YData-profiling, Presidio
pip install -e ".[ner]"        # + spaCy for short text NER
pip install -e ".[dp]"         # + OpenDP for formal DP guarantees
pip install -e ".[dev]"        # + pytest, coverage, black, ruff

From PyPI:

pip install pamola-core

Supported Python Versions

PAMOLA.CORE supports Python 3.10, 3.11, and 3.12 (requires-python = ">=3.10,<3.13").

Python Version Supported
3.10
3.11
3.12
3.9 and below
3.13 and above ❌ (blocked by third-party dependencies)

Core Dependencies

These packages are declared in pyproject.toml under [project.dependencies] and are automatically installed with pip install pamola-core.

Package Version Range Purpose
numpy ==1.26.4 Numerical computation and array operations used throughout privacy metrics, attack simulations, and statistical analysis
pandas ==2.2.2 Tabular data structures and DataFrame processing; the primary data container for all PAMOLA operations
scikit-learn ==1.7.2 Machine learning utilities used by core operations including nearest-neighbor attacks, classification metrics, and model-based privacy risk assessment

Versioning

PAMOLA.CORE follows Semantic Versioning and PEP 440.

import pamola_core
print(pamola_core.__version__)  # e.g. "1.0.0.dev1"
Phase Version Branch Tag Install
Dev 1.0.0.dev1 develop v1.0.0.dev1 pip install pamola-core==1.0.0.dev1
Stable 1.0.0 main v1.0.0 pip install pamola-core
  • Source of truth: pyproject.tomlversion
  • Changelog: CHANGELOG.md
  • CI/CD: GitHub Actions — lint, test (3.10/3.11/3.12), build, PyPI publish on tag v*
  • Release rules: Dev tags (v*dev*) must be on develop, stable tags on main

CLI

pamola-core --version
pamola-core list-ops
pamola-core run --task task.json --output ./task_dir
pamola-core run --op MaskingOperation --config config.json --input data.csv
pamola-core schema MaskingOperation

Sample Data

Note: Synthetic test datasets are available in data/raw/ for development and testing purposes only.

No real personal data (PII/PHI) is included. All records are artificially generated.


Philosophy

  • Operations-first: Direct transforms, not constraint optimization
  • Measure everything: Quality, privacy, utility - with verdicts
  • Test before release: Practical risk via data-release attacks
  • Noise with transparency: DP-semantics + detailed reports
  • Reproducibility by default: manifest.json tracks everything

API Documentation

The project uses Sphinx to generate API reference documentation from Python docstrings.

Build the documentation locally:

cd docs
make html

The generated documentation will be available at:

docs/_build/html/index.html

Documentation

Resource Link
PET Knowledge Base realmdata.io/kb
Technical Documentation docs/en/index.md
Glossary realmdata.io/glossary
Examples examples/

Use Cases

  • Data Engineering: Prepare privacy-safe datasets for ML training
  • Healthcare: HIPAA-oriented de-identification workflows (Safe Harbor support)
  • Finance: Privacy engineering aligned with PCI/GDPR considerations
  • Compliance: Audit-ready evidence with manifest.json and attack reports
  • Data Sharing: Risk-assessed data exchange between organizations

Regulatory Context

PAMOLA.CORE provides technical building blocks for privacy compliance programs:

Regulation Relevant Capabilities
GDPR Pseudonymization, data minimization (Art. 25, 32)
HIPAA Safe Harbor de-identification support
CCPA/CPRA Data suppression, anonymization workflows

Important: PAMOLA.CORE provides technical capabilities only. Legal compliance requires organizational policies, procedures, and legal guidance beyond software tools.


Contributing

git clone https://github.com/DGT-Network/PAMOLA.git
cd PAMOLA
pip install -e ".[dev]"
pytest tests/ -v

See CONTRIBUTING.md for guidelines.


Ownership & Licensing

PAMOLA.CORE is developed and owned exclusively by Realm Inveo Inc.

This repository is hosted under DGT-Network GitHub organization, which provides shared development infrastructure for Realm Inveo projects. DGT-Network does not claim ownership of this intellectual property. All IP rights belong exclusively to Realm Inveo Inc.

License: Apache 2.0 - see LICENSE


Contact

Purpose Contact
General inquiries contact@realmdata.io
Commercial / Sales sales@realmdata.io
Due diligence / Legal legal@realmdata.io
Website realmdata.io

Built by Realm Inveo Inc.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pamola_core-1.0.0.dev2.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pamola_core-1.0.0.dev2-py3-none-any.whl (2.2 MB view details)

Uploaded Python 3

File details

Details for the file pamola_core-1.0.0.dev2.tar.gz.

File metadata

  • Download URL: pamola_core-1.0.0.dev2.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pamola_core-1.0.0.dev2.tar.gz
Algorithm Hash digest
SHA256 9b59de21af80f2de7eb1ee909c737ae4a598a33d3f43448e6ad012fedb34215b
MD5 4470539e9c0d795a695df85c45697556
BLAKE2b-256 7c6298d4c8a374594e6e79413fa1a8b5d1dcad1d82c76234974dc37bc881d253

See more details on using hashes here.

File details

Details for the file pamola_core-1.0.0.dev2-py3-none-any.whl.

File metadata

File hashes

Hashes for pamola_core-1.0.0.dev2-py3-none-any.whl
Algorithm Hash digest
SHA256 40baa0680b2276e8826a497d8b78438b8d2cc65a53a5a3d09065904dcc618819
MD5 23aecef5c2600618c48b80e66d8a4ec6
BLAKE2b-256 89feb7d78f995038fdc06d7626b48e7ecfb74ad98591de12e24e5346625aea43

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page