DPDP-compliant data erasure for Indian apps: legal-hold-aware deletion, PII anonymization, and tamper-evident Certificates of Erasure. Zero-egress - runs inside your app.
Project description
dpdpstack
DPDP-compliant data erasure for Indian apps - handled in your code.
Indian developers keep hitting the same wall: DPDP says erase the user's data on withdrawal, but RBI (KYC, 5 yrs), PMLA, CERT-In (logs, 180 days) and the Companies Act say keep it. So teams hand-delete data across tables, can't prove it, and enterprise tools "cost more than a month's revenue."
dpdpstack is a small, zero-egress library that handles the hard part:
- Legal-hold-aware erasure - delete now, or defer under RBI/PMLA/CERT-In holds (with the basis recorded), then erase when the hold lapses.
- PII anonymization - irreversibly null/hash PII while keeping the ledger row (referential integrity), the way teams actually solve this.
- Certificate of Erasure - a verifiable, tamper-evident proof you erased (or are lawfully holding) a user's data.
- Zero-egress - you perform the mutation in your own DB; the library only decides and records. Personal data never leaves your systems.
Not a cookie banner. Not a consultant. A deletion/retention engine for developers.
Documentation · Source · Hosted platform
Install
pip install dpdpstack-python-sdk # core, no dependencies
pip install "dpdpstack-python-sdk[django]" # + Django adapter
pip install "dpdpstack-python-sdk[sqlalchemy]" # + SQLAlchemy adapter (FastAPI/Flask/…)
pip install "dpdpstack-python-sdk[crypto]" # + signed certs & crypto-shred (PyJWT + cryptography)
Quickstart (framework-agnostic)
from dpdpstack import ErasureEngine, AuditLog, RetentionPolicy, Action, rbi_kyc, issue_certificate
engine = ErasureEngine(AuditLog())
# Normal purpose: hard-delete on withdrawal. Your delete runs in `executor`.
engine.request_erasure(
subject="user_42",
policy=RetentionPolicy(purpose="marketing", action=Action.DELETE),
reason="consent_withdrawn",
executor=lambda action: my_delete_user(42),
)
# KYC: RBI mandates 5y retention -> erasure is DEFERRED, not refused.
res = engine.request_erasure(subject="user_42", policy=rbi_kyc("kyc"), reason="consent_withdrawn")
print(res.status, res.legal_basis, res.erase_after) # deferred RBI KYC... 2031-...
cert = issue_certificate(engine.audit, "user_42", "marketing") # verifiable proof
Django (zero-egress, runs against your models)
# settings.py
INSTALLED_APPS += ["dpdpstack.contrib.django"]
# python manage.py migrate
from dpdpstack import RetentionPolicy, Action, null, redact, rbi_kyc
from dpdpstack.contrib.django.service import erase_instance, pii
# Declare a model's PII fields once with @pii - no pii_fields= on every call.
@pii(name=null, email=null, phone=redact(keep_last=4))
class User(models.Model):
...
# Hard delete + audit
erase_instance(user, policy=RetentionPolicy(purpose="marketing", action=Action.DELETE),
subject=user.external_ref)
# Anonymize PII, keep the (regulated) row - uses the @pii declaration above
erase_instance(user, policy=RetentionPolicy(purpose="profile", action=Action.ANONYMIZE),
subject=user.external_ref)
# KYC withdrawal -> deferred under RBI hold, nothing deleted, basis recorded
erase_instance(user, policy=rbi_kyc("kyc"), subject=user.external_ref)
FastAPI / Flask / any SQLAlchemy app ([sqlalchemy])
The same engine + DB-backed audit chain, against a SQLAlchemy Session. You map the
audit entry once (you own the Base); the @pii declaration is shared with Django.
from sqlalchemy.orm import DeclarativeBase
from dpdpstack import RetentionPolicy, Action, null, redact, rbi_kyc
from dpdpstack.contrib.sqlalchemy.models import DpdpAuditEntryMixin
from dpdpstack.contrib.sqlalchemy.service import erase_instance, pii
class Base(DeclarativeBase): ...
class DpdpAuditEntry(Base, DpdpAuditEntryMixin): # the hash-chained audit store
__tablename__ = "dpdp_audit_entries"
@pii(name=null, email=null, phone=redact(keep_last=4))
class User(Base):
__tablename__ = "users"
...
# Anonymize PII, keep the (regulated) row; your session, your transaction.
erase_instance(session, user, audit_model=DpdpAuditEntry, subject=user.external_ref,
policy=RetentionPolicy(purpose="profile", action=Action.ANONYMIZE))
# KYC withdrawal -> deferred under RBI hold, nothing deleted, basis recorded
erase_instance(session, user, audit_model=DpdpAuditEntry, subject=user.external_ref,
policy=rbi_kyc("kyc"))
session.commit()
Find your PII fields (scan)
You declare PII once with @pii(...) - but which fields are PII? scan finds them
for you. It reads field names and types only (never a single row), matches them
against an India-first catalog (Aadhaar, PAN, GST, UPI, phone, email, special-category…),
and suggests an anonymize strategy for each. Output is advisory - you review it, then
paste. Zero-egress and zero-dependency.
Django - scan your models and get pasteable @pii(...) blocks:
python manage.py dpdp_scan --format python # or: text (default) | json
# or, without a manage.py:
dpdpstack scan --django --settings myproject.settings --app accounts --format python
# accounts.User
@pii(
name=null,
email=null,
phone=redact(keep_last=4),
aadhaar_number=hashed(),
)
class User(models.Model):
...
Re-running tags each field new (PII, not declared), covered (already declared), or
drift (declared, but no longer looks like PII) - so it doubles as an ongoing audit.
Anything else - a sample dict, an API payload, a column list:
from dpdpstack import anonymize_fields
from dpdpstack.detect import scan_mapping, suggest_strategies
suggest_strategies(["email", "phone", "pan", "ledger_balance"])
# {'email': <null>, 'phone': <redact>, 'pan': <hashed>} # 'ledger_balance' ignored
record = {"email": "a@b.com", "phone": "9876543210", "ledger_balance": 500}
clean = anonymize_fields(record, suggest_strategies(record.keys()))
dpdpstack scan --keys email,phone,pan --format python # comma-separated names
dpdpstack scan --dict sample.json --format python # keys of a JSON object ('-' = stdin)
Bring your own catalog by passing a JSON file of the same shape to load_catalog(path=...).
Detect PII in values (and classify a breach)
The scanner above reads field names; detect_values reads values / free text -
useful to confirm a column really holds PII, or to fill a breach report's nature
field. Aadhaar is checked with the Verhoeff checksum and cards with Luhn, so
random 12-/16-digit numbers don't false-positive. Local, zero-dependency.
from dpdpstack import detect_values, classify_breach_nature
detect_values("PAN ABCDE1234F, card 4111 1111 1111 1111")
# [ValueMatch(type='PAN', ...), ValueMatch(type='Payment Card', ...)]
classify_breach_nature("leaked rows: asha@bank.in, Aadhaar 2341 2341 2346, plus medical records")
# ['Email Address', 'Aadhaar Number', 'Health Data'] # for a Rule 7 breach report
Lint your retention policies (DPDP)
lint statically checks a RetentionPolicy for compliance smells - a legal hold with
no recorded basis, a hold that will hard-delete a regulated row, a basis cited without a
hold period, retention far past what's justified - each tied to a DPDP citation. Offline
and advisory.
from dpdpstack import RetentionPolicy, Action, lint_policy
lint_policy(RetentionPolicy(purpose="kyc", legal_hold_days=1825, action=Action.DELETE))
# [ERROR E001: ... no legal_basis recorded ... [DPDP Rules, 2025 - Rule 8],
# WARNING W001: ... action=delete will hard-delete ... consider action=anonymize ...]
From the shell (exit code is non-zero if any error is found, so it drops into CI):
dpdpstack lint --presets # the built-in presets are clean
dpdpstack lint --purpose kyc --legal-hold-days 1825 --action delete # E001 + W001
dpdpstack.rules also exposes DPDP_RULES and STATUTORY_HOLDS (RBI/PMLA/CERT-In/
Companies Act) as a citable reference.
score_policies(...) rolls the findings into a graded readiness report - a deterministic
0-100 score, letter grade, and tier across all your policies (great for a dashboard or an
onboarding report):
from dpdpstack import score_policies, rbi_kyc, pmla
score_policies([rbi_kyc(), pmla()]).summary
# '100/100 (A+, exemplary) across 2 policies: 2 clean, 0 errors, 0 warnings.'
dpdpstack lint --presets --score # ... Readiness: 100/100 (A+, exemplary) across 5 policies …
Retention-safe audit + offline verification
The audit log is hash-chained, so any change breaks verify(). But a retention log
must be prunable - and a pruned chain no longer starts at sequence 1, which would break
verification. Checkpoints fix that: snapshot a run of entries into an immutable,
self-chaining Checkpoint, then prune; verification anchors to the checkpoint instead
of the genesis.
log = AuditLog(JsonlAuditStore("audit.jsonl"))
# ... record events ...
cp = log.checkpoint(through_sequence=1000) # immutable snapshot (persist it)
log.prune_through(1000) # drop the archived entries
log.verify_report([cp]) # VerifyResult(ok=True, checked=…, anchored_at=1000)
log.verify_report() # ok=False, first_error_sequence pinpoints any tampering
An auditor can verify a chain straight from storage - no backend, no API to trust:
dpdpstack verify-chain audit.jsonl --checkpoints cp.jsonl
# OK - verified 2400 entries (anchored at #1000).
# (exits non-zero and names the broken entry if the chain was tampered with)
Crypto-shred PII in the audit log (optional, [crypto])
The chain normally holds no PII (subject is an opaque ref). When you must record PII
inside an entry, seal it: the PII is encrypted into an opaque token that the entry
hash covers. Verification runs on the ciphertext, so you can later destroy the key
(right-to-erasure) - the payload becomes unreadable while the chain still verifies.
from dpdpstack.sealing import generate_seal_key
key = generate_seal_key() # keep secret; deleting it shreds the data
e = log.record("evidence", subject="user_42",
private={"aadhaar": "2341 2341 2346"}, seal_key=key)
AuditLog.open_sealed(e, key) # -> {"aadhaar": "…"} (with the key)
log.verify() # True — even after the key is destroyed
Key rotation (zero-downtime): pass a list of keys, newest first. New entries seal with the first key; unsealing tries all, so older-key entries still open. The ciphertext is part of the entry hash, so chain entries are never re-encrypted — keep an old key around to read old entries, and retire it once they've been pruned or shredded.
new = generate_seal_key()
log.record("evidence", subject="user_43", private={…}, seal_key=[new, key]) # seals with `new`
AuditLog.open_sealed(e, [new, key]) # still opens the old-key entry
Push evidence to the hosted vault (optional)
Keep everything local, or push your tamper-evident chain to a vault (e.g. getdpdp.net) for an independent, server-timestamped, counter-signed copy. The push carries evidence only - opaque refs, event types, and hashes (plus any sealed ciphertext) - never PII, so it stays zero-egress. It's zero-dependency (stdlib), idempotent at the vault (re-pushing is a no-op), and the fire-and-forget variant never blocks or raises in your request path.
from dpdpstack import EvidenceClient
vault = EvidenceClient("https://getdpdp.net/api/v1", api_key="dpdp_sk_…", source="api")
vault.push(log) # synchronous: -> {"stored": N, "chain_verified": True, …}
vault.push_background(log) # fire-and-forget: returns immediately, errors swallowed
Signed certificates (optional, [crypto])
The hash-chained Certificate of Erasure is tamper-evident on its own; add an RS256 signature so anyone can verify it with your public key (and you can't forge it):
from dpdpstack import issue_certificate
from dpdpstack.signing import generate_keypair, issue_signed_certificate, verify_certificate
private_pem, public_pem = generate_keypair() # keep private secret; publish public
cert = issue_certificate(engine.audit, "user_42", "marketing")
token = issue_signed_certificate(cert, private_pem) # compact JWT
verify_certificate(token, public_pem) # -> {"valid": True, ...}
This is the basis for the hosted, counter-signed certificate at getdpdp.net - a regulator/auditor verifies it independently, and the issuer cannot fake it.
CLI (verify a certificate offline)
With the [crypto] extra installed, an auditor can verify a Certificate of Erasure
from the shell - no code, just the cert and your public key:
dpdpstack keygen --out-dir ./keys # one-time: make a signing keypair
dpdpstack verify cert.jwt --public-key ./keys/cert_public.pem
# VALID - signature verified.
# subject: user_42 · status: erased (delete) · chain ok: True
(python -m dpdpstack verify ... works too.)
Presets for the common conflicts
rbi_kyc() (5-yr hold, anonymize) · pmla() · cert_in_logs() (180-day log hold) · companies_act() (8-yr books of account) · third_schedule() (DPDP specified period). Or build your own RetentionPolicy(retention_days=…, legal_hold_days=…, legal_basis="…", action=…).
What's in the box
| Module | What |
|---|---|
policies |
RetentionPolicy + RBI/PMLA/CERT-In/Companies-Act/Third-Schedule presets |
anonymize |
null / hashed / redact / constant field strategies |
audit |
hash-chained log + checkpoints/pruning + verify_report; store (in-memory, JSONL, Django, SQLAlchemy) |
erasure |
ErasureEngine - legal-hold-aware resolve + your executor |
certificate |
issue_certificate() → verifiable Certificate of Erasure |
detect |
PII discovery - schema (scan_mapping) + values (detect_values, classify_breach_nature) |
rules |
DPDP knowledge pack + lint_policy() / dpdpstack lint |
vault |
EvidenceClient - push the chain to a hosted vault (evidence only, fire-and-forget) |
sealing (extra) |
crypto-shred PII in the chain - seal / unseal / AuditLog.open_sealed |
signing (extra) |
RS256-sign/verify a certificate - pip install dpdpstack-python-sdk[crypto] |
contrib.django |
model-backed audit store + erase_instance() + @pii(...) + dpdp_scan |
contrib.sqlalchemy (extra) |
the same for any SQLAlchemy app (FastAPI/Flask/…) |
CLI: dpdpstack scan · lint · verify-chain · verify · keygen (python -m dpdpstack …).
Status & scope
Alpha (0.6). The core is dependency-free and framework-agnostic; Django and SQLAlchemy adapters ship today. Hosted/managed version (dashboard, cross-system fan-out, certificate vault): getdpdp.net.
dpdpstack is tooling, not legal advice; you remain the Data Fiduciary. MIT licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dpdpstack_python_sdk-0.6.0.tar.gz.
File metadata
- Download URL: dpdpstack_python_sdk-0.6.0.tar.gz
- Upload date:
- Size: 58.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05862748ec26e469b6c67bf9a4072d72aae05d97d3d5cdf5272571be92907998
|
|
| MD5 |
63eb5bed708f0a2cdfc7180bbf572a01
|
|
| BLAKE2b-256 |
fd2372f042e32f1fdd82cc5f13cc82e6e2a9a5c3f8cc32e717dd360772e78915
|
Provenance
The following attestation bundles were made for dpdpstack_python_sdk-0.6.0.tar.gz:
Publisher:
publish.yml on getdpdp/dpdpstack-python-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dpdpstack_python_sdk-0.6.0.tar.gz -
Subject digest:
05862748ec26e469b6c67bf9a4072d72aae05d97d3d5cdf5272571be92907998 - Sigstore transparency entry: 1883412390
- Sigstore integration time:
-
Permalink:
getdpdp/dpdpstack-python-sdk@ea19051ae9ba82937479076836828843ea6b8d37 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/getdpdp
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ea19051ae9ba82937479076836828843ea6b8d37 -
Trigger Event:
push
-
Statement type:
File details
Details for the file dpdpstack_python_sdk-0.6.0-py3-none-any.whl.
File metadata
- Download URL: dpdpstack_python_sdk-0.6.0-py3-none-any.whl
- Upload date:
- Size: 53.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad9f25a15c914e333c1faab7fa7ecfc28ebcd83a39a3b1712c05991e0fbe72e7
|
|
| MD5 |
238e578143fb56a94e30b4bec7534526
|
|
| BLAKE2b-256 |
265e032ade7a86a56ad548c2f931f0e14684fe1f0589328572b98aefaa00f2af
|
Provenance
The following attestation bundles were made for dpdpstack_python_sdk-0.6.0-py3-none-any.whl:
Publisher:
publish.yml on getdpdp/dpdpstack-python-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dpdpstack_python_sdk-0.6.0-py3-none-any.whl -
Subject digest:
ad9f25a15c914e333c1faab7fa7ecfc28ebcd83a39a3b1712c05991e0fbe72e7 - Sigstore transparency entry: 1883412561
- Sigstore integration time:
-
Permalink:
getdpdp/dpdpstack-python-sdk@ea19051ae9ba82937479076836828843ea6b8d37 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/getdpdp
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ea19051ae9ba82937479076836828843ea6b8d37 -
Trigger Event:
push
-
Statement type: