Skip to main content

DPDP-compliant data erasure for Indian apps: legal-hold-aware deletion, PII anonymization, and tamper-evident Certificates of Erasure. Zero-egress - runs inside your app.

Project description

dpdpstack

DPDP-compliant data erasure for Indian apps - handled in your code.

Indian developers keep hitting the same wall: DPDP says erase the user's data on withdrawal, but RBI (KYC, 5 yrs), PMLA, CERT-In (logs, 180 days) and the Companies Act say keep it. So teams hand-delete data across tables, can't prove it, and enterprise tools "cost more than a month's revenue."

dpdpstack is a small, zero-egress library that handles the hard part:

  • Legal-hold-aware erasure - delete now, or defer under RBI/PMLA/CERT-In holds (with the basis recorded), then erase when the hold lapses.
  • PII anonymization - irreversibly null/hash PII while keeping the ledger row (referential integrity), the way teams actually solve this.
  • Certificate of Erasure - a verifiable, tamper-evident proof you erased (or are lawfully holding) a user's data.
  • Zero-egress - you perform the mutation in your own DB; the library only decides and records. Personal data never leaves your systems.

Not a cookie banner. Not a consultant. A deletion/retention engine for developers.

Documentation · Source · Hosted platform

Install

pip install dpdpstack-python-sdk                # core, no dependencies
pip install "dpdpstack-python-sdk[django]"      # + Django adapter
pip install "dpdpstack-python-sdk[sqlalchemy]"  # + SQLAlchemy adapter (FastAPI/Flask/…)
pip install "dpdpstack-python-sdk[crypto]"      # + signed certs & crypto-shred (PyJWT + cryptography)

Quickstart (framework-agnostic)

from dpdpstack import ErasureEngine, AuditLog, RetentionPolicy, Action, rbi_kyc, issue_certificate

engine = ErasureEngine(AuditLog())

# Normal purpose: hard-delete on withdrawal. Your delete runs in `executor`.
engine.request_erasure(
    subject="user_42",
    policy=RetentionPolicy(purpose="marketing", action=Action.DELETE),
    reason="consent_withdrawn",
    executor=lambda action: my_delete_user(42),
)

# KYC: RBI mandates 5y retention -> erasure is DEFERRED, not refused.
res = engine.request_erasure(subject="user_42", policy=rbi_kyc("kyc"), reason="consent_withdrawn")
print(res.status, res.legal_basis, res.erase_after)   # deferred  RBI KYC...  2031-...

cert = issue_certificate(engine.audit, "user_42", "marketing")  # verifiable proof

Django (zero-egress, runs against your models)

# settings.py
INSTALLED_APPS += ["dpdpstack.contrib.django"]
# python manage.py migrate

from dpdpstack import RetentionPolicy, Action, null, redact, rbi_kyc
from dpdpstack.contrib.django.service import erase_instance, pii

# Declare a model's PII fields once with @pii - no pii_fields= on every call.
@pii(name=null, email=null, phone=redact(keep_last=4))
class User(models.Model):
    ...

# Hard delete + audit
erase_instance(user, policy=RetentionPolicy(purpose="marketing", action=Action.DELETE),
               subject=user.external_ref)

# Anonymize PII, keep the (regulated) row - uses the @pii declaration above
erase_instance(user, policy=RetentionPolicy(purpose="profile", action=Action.ANONYMIZE),
               subject=user.external_ref)

# KYC withdrawal -> deferred under RBI hold, nothing deleted, basis recorded
erase_instance(user, policy=rbi_kyc("kyc"), subject=user.external_ref)

FastAPI / Flask / any SQLAlchemy app ([sqlalchemy])

The same engine + DB-backed audit chain, against a SQLAlchemy Session. You map the audit entry once (you own the Base); the @pii declaration is shared with Django.

from sqlalchemy.orm import DeclarativeBase
from dpdpstack import RetentionPolicy, Action, null, redact, rbi_kyc
from dpdpstack.contrib.sqlalchemy.models import DpdpAuditEntryMixin
from dpdpstack.contrib.sqlalchemy.service import erase_instance, pii

class Base(DeclarativeBase): ...

class DpdpAuditEntry(Base, DpdpAuditEntryMixin):   # the hash-chained audit store
    __tablename__ = "dpdp_audit_entries"

@pii(name=null, email=null, phone=redact(keep_last=4))
class User(Base):
    __tablename__ = "users"
    ...

# Anonymize PII, keep the (regulated) row; your session, your transaction.
erase_instance(session, user, audit_model=DpdpAuditEntry, subject=user.external_ref,
               policy=RetentionPolicy(purpose="profile", action=Action.ANONYMIZE))

# KYC withdrawal -> deferred under RBI hold, nothing deleted, basis recorded
erase_instance(session, user, audit_model=DpdpAuditEntry, subject=user.external_ref,
               policy=rbi_kyc("kyc"))
session.commit()

Find your PII fields (scan)

You declare PII once with @pii(...) - but which fields are PII? scan finds them for you. It reads field names and types only (never a single row), matches them against an India-first catalog (Aadhaar, PAN, GST, UPI, phone, email, special-category…), and suggests an anonymize strategy for each. Output is advisory - you review it, then paste. Zero-egress and zero-dependency.

Django - scan your models and get pasteable @pii(...) blocks:

python manage.py dpdp_scan --format python        # or: text (default) | json
# or, without a manage.py:
dpdpstack scan --django --settings myproject.settings --app accounts --format python
# accounts.User
@pii(
    name=null,
    email=null,
    phone=redact(keep_last=4),
    aadhaar_number=hashed(),
)
class User(models.Model):
    ...

Re-running tags each field new (PII, not declared), covered (already declared), or drift (declared, but no longer looks like PII) - so it doubles as an ongoing audit.

Anything else - a sample dict, an API payload, a column list:

from dpdpstack import anonymize_fields
from dpdpstack.detect import scan_mapping, suggest_strategies

suggest_strategies(["email", "phone", "pan", "ledger_balance"])
# {'email': <null>, 'phone': <redact>, 'pan': <hashed>}   # 'ledger_balance' ignored

record = {"email": "a@b.com", "phone": "9876543210", "ledger_balance": 500}
clean = anonymize_fields(record, suggest_strategies(record.keys()))
dpdpstack scan --keys email,phone,pan --format python      # comma-separated names
dpdpstack scan --dict sample.json --format python          # keys of a JSON object ('-' = stdin)

Bring your own catalog by passing a JSON file of the same shape to load_catalog(path=...).

Detect PII in values (and classify a breach)

The scanner above reads field names; detect_values reads values / free text - useful to confirm a column really holds PII, or to fill a breach report's nature field. Aadhaar is checked with the Verhoeff checksum and cards with Luhn, so random 12-/16-digit numbers don't false-positive. Local, zero-dependency.

from dpdpstack import detect_values, classify_breach_nature

detect_values("PAN ABCDE1234F, card 4111 1111 1111 1111")
# [ValueMatch(type='PAN', ...), ValueMatch(type='Payment Card', ...)]

classify_breach_nature("leaked rows: asha@bank.in, Aadhaar 2341 2341 2346, plus medical records")
# ['Email Address', 'Aadhaar Number', 'Health Data']   # for a Rule 7 breach report

Lint your retention policies (DPDP)

lint statically checks a RetentionPolicy for compliance smells - a legal hold with no recorded basis, a hold that will hard-delete a regulated row, a basis cited without a hold period, retention far past what's justified - each tied to a DPDP citation. Offline and advisory.

from dpdpstack import RetentionPolicy, Action, lint_policy

lint_policy(RetentionPolicy(purpose="kyc", legal_hold_days=1825, action=Action.DELETE))
# [ERROR E001: ... no legal_basis recorded ...  [DPDP Rules, 2025 - Rule 8],
#  WARNING W001: ... action=delete will hard-delete ... consider action=anonymize ...]

From the shell (exit code is non-zero if any error is found, so it drops into CI):

dpdpstack lint --presets                                   # the built-in presets are clean
dpdpstack lint --purpose kyc --legal-hold-days 1825 --action delete   # E001 + W001

dpdpstack.rules also exposes DPDP_RULES and STATUTORY_HOLDS (RBI/PMLA/CERT-In/ Companies Act) as a citable reference.

score_policies(...) rolls the findings into a graded readiness report - a deterministic 0-100 score, letter grade, and tier across all your policies (great for a dashboard or an onboarding report):

from dpdpstack import score_policies, rbi_kyc, pmla

score_policies([rbi_kyc(), pmla()]).summary
# '100/100 (A+, exemplary) across 2 policies: 2 clean, 0 errors, 0 warnings.'
dpdpstack lint --presets --score      # ... Readiness: 100/100 (A+, exemplary) across 5 policies …

Retention-safe audit + offline verification

The audit log is hash-chained, so any change breaks verify(). But a retention log must be prunable - and a pruned chain no longer starts at sequence 1, which would break verification. Checkpoints fix that: snapshot a run of entries into an immutable, self-chaining Checkpoint, then prune; verification anchors to the checkpoint instead of the genesis.

log = AuditLog(JsonlAuditStore("audit.jsonl"))
# ... record events ...
cp = log.checkpoint(through_sequence=1000)   # immutable snapshot (persist it)
log.prune_through(1000)                       # drop the archived entries

log.verify_report([cp])      # VerifyResult(ok=True, checked=…, anchored_at=1000)
log.verify_report()          # ok=False, first_error_sequence pinpoints any tampering

An auditor can verify a chain straight from storage - no backend, no API to trust:

dpdpstack verify-chain audit.jsonl --checkpoints cp.jsonl
# OK - verified 2400 entries (anchored at #1000).
#  (exits non-zero and names the broken entry if the chain was tampered with)

Crypto-shred PII in the audit log (optional, [crypto])

The chain normally holds no PII (subject is an opaque ref). When you must record PII inside an entry, seal it: the PII is encrypted into an opaque token that the entry hash covers. Verification runs on the ciphertext, so you can later destroy the key (right-to-erasure) - the payload becomes unreadable while the chain still verifies.

from dpdpstack.sealing import generate_seal_key

key = generate_seal_key()                       # keep secret; deleting it shreds the data
e = log.record("evidence", subject="user_42",
               private={"aadhaar": "2341 2341 2346"}, seal_key=key)
AuditLog.open_sealed(e, key)                     # -> {"aadhaar": "…"}  (with the key)
log.verify()                                     # True — even after the key is destroyed

Key rotation (zero-downtime): pass a list of keys, newest first. New entries seal with the first key; unsealing tries all, so older-key entries still open. The ciphertext is part of the entry hash, so chain entries are never re-encrypted — keep an old key around to read old entries, and retire it once they've been pruned or shredded.

new = generate_seal_key()
log.record("evidence", subject="user_43", private={}, seal_key=[new, key])  # seals with `new`
AuditLog.open_sealed(e, [new, key])              # still opens the old-key entry

Push evidence to the hosted vault (optional)

Keep everything local, or push your tamper-evident chain to a vault (e.g. getdpdp.net) for an independent, server-timestamped, counter-signed copy. The push carries evidence only - opaque refs, event types, and hashes (plus any sealed ciphertext) - never PII, so it stays zero-egress. It's zero-dependency (stdlib), idempotent at the vault (re-pushing is a no-op), and the fire-and-forget variant never blocks or raises in your request path.

from dpdpstack import EvidenceClient

vault = EvidenceClient("https://getdpdp.net/api/v1", api_key="dpdp_sk_…", source="api")

vault.push(log)                # synchronous: -> {"stored": N, "chain_verified": True, …}
vault.push_background(log)     # fire-and-forget: returns immediately, errors swallowed

Signed certificates (optional, [crypto])

The hash-chained Certificate of Erasure is tamper-evident on its own; add an RS256 signature so anyone can verify it with your public key (and you can't forge it):

from dpdpstack import issue_certificate
from dpdpstack.signing import generate_keypair, issue_signed_certificate, verify_certificate

private_pem, public_pem = generate_keypair()      # keep private secret; publish public
cert = issue_certificate(engine.audit, "user_42", "marketing")
token = issue_signed_certificate(cert, private_pem)   # compact JWT
verify_certificate(token, public_pem)                 # -> {"valid": True, ...}

This is the basis for the hosted, counter-signed certificate at getdpdp.net - a regulator/auditor verifies it independently, and the issuer cannot fake it.

CLI (verify a certificate offline)

With the [crypto] extra installed, an auditor can verify a Certificate of Erasure from the shell - no code, just the cert and your public key:

dpdpstack keygen --out-dir ./keys                       # one-time: make a signing keypair
dpdpstack verify cert.jwt --public-key ./keys/cert_public.pem
# VALID - signature verified.
#   subject: user_42 · status: erased (delete) · chain ok: True

(python -m dpdpstack verify ... works too.)

Presets for the common conflicts

rbi_kyc() (5-yr hold, anonymize) · pmla() · cert_in_logs() (180-day log hold) · companies_act() (8-yr books of account) · third_schedule() (DPDP specified period). Or build your own RetentionPolicy(retention_days=…, legal_hold_days=…, legal_basis="…", action=…).

What's in the box

Module What
policies RetentionPolicy + RBI/PMLA/CERT-In/Companies-Act/Third-Schedule presets
anonymize null / hashed / redact / constant field strategies
audit hash-chained log + checkpoints/pruning + verify_report; store (in-memory, JSONL, Django, SQLAlchemy)
erasure ErasureEngine - legal-hold-aware resolve + your executor
certificate issue_certificate() → verifiable Certificate of Erasure
detect PII discovery - schema (scan_mapping) + values (detect_values, classify_breach_nature)
rules DPDP knowledge pack + lint_policy() / dpdpstack lint
vault EvidenceClient - push the chain to a hosted vault (evidence only, fire-and-forget)
sealing (extra) crypto-shred PII in the chain - seal / unseal / AuditLog.open_sealed
signing (extra) RS256-sign/verify a certificate - pip install dpdpstack-python-sdk[crypto]
contrib.django model-backed audit store + erase_instance() + @pii(...) + dpdp_scan
contrib.sqlalchemy (extra) the same for any SQLAlchemy app (FastAPI/Flask/…)

CLI: dpdpstack scan · lint · verify-chain · verify · keygen (python -m dpdpstack …).

Status & scope

Alpha (0.6). The core is dependency-free and framework-agnostic; Django and SQLAlchemy adapters ship today. Hosted/managed version (dashboard, cross-system fan-out, certificate vault): getdpdp.net.

dpdpstack is tooling, not legal advice; you remain the Data Fiduciary. MIT licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dpdpstack_python_sdk-0.6.0.tar.gz (58.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dpdpstack_python_sdk-0.6.0-py3-none-any.whl (53.1 kB view details)

Uploaded Python 3

File details

Details for the file dpdpstack_python_sdk-0.6.0.tar.gz.

File metadata

  • Download URL: dpdpstack_python_sdk-0.6.0.tar.gz
  • Upload date:
  • Size: 58.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dpdpstack_python_sdk-0.6.0.tar.gz
Algorithm Hash digest
SHA256 05862748ec26e469b6c67bf9a4072d72aae05d97d3d5cdf5272571be92907998
MD5 63eb5bed708f0a2cdfc7180bbf572a01
BLAKE2b-256 fd2372f042e32f1fdd82cc5f13cc82e6e2a9a5c3f8cc32e717dd360772e78915

See more details on using hashes here.

Provenance

The following attestation bundles were made for dpdpstack_python_sdk-0.6.0.tar.gz:

Publisher: publish.yml on getdpdp/dpdpstack-python-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dpdpstack_python_sdk-0.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dpdpstack_python_sdk-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ad9f25a15c914e333c1faab7fa7ecfc28ebcd83a39a3b1712c05991e0fbe72e7
MD5 238e578143fb56a94e30b4bec7534526
BLAKE2b-256 265e032ade7a86a56ad548c2f931f0e14684fe1f0589328572b98aefaa00f2af

See more details on using hashes here.

Provenance

The following attestation bundles were made for dpdpstack_python_sdk-0.6.0-py3-none-any.whl:

Publisher: publish.yml on getdpdp/dpdpstack-python-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page