The identity workflow framework. African-first, globally pluggable. Compose detection, resolution, linking, verification, and governance into production identity pipelines. By unpatterned.org.
Project description
arche-core
African PII detection that cites the law it enforces.
arche-core detects PII for African jurisdictions; government IDs, names, phone numbers, addresses, and grounds every detection in the data protection statute that governs it. NDPA, POPIA, Kenya DPA, Ghana DPA. Six closed policy actions. Composes with Presidio, GLiNER, and Splink.
Presidio detects PII. GLiNER does multilingual NER. Splink links records. None of them know that a BVN is sensitive under NDPA §30, or that "Adeyẹmí" and "Adeyemi" are the same Yoruba name with and without tonal marks, or that "behind Total filling station, Madina Junction" is a parseable Ghanaian address.
arche-coredoes that one job.
from arche import Pipeline
pipeline = Pipeline(jurisdiction="NG") # auto-loads NDPA-2023
result = pipeline.process(
"Fatima Abdullahi, NIN 12345678901, BVN 22100987654."
)
for d in result.detections:
print(f"{d.category:11} tier={d.sensitivity_tier.value:9} {d.regulatory_citation}")
# PII-2-BVN tier=high NDPA-2023 s.30, CBN BVN policy 2014
# PII-2-NIN tier=high NDPA-2023 s.30, NIMC Act s.27
# PII-1-NAME tier=moderate NDPA-2023 s.30 (×2 — given + family name)
print(result.redacted_text)
# NAME_... NAME_..., NIN [NIN], BVN [BVN].
Same code works for jurisdiction="ZA" (POPIA), "KE" (Kenya DPA), "GH" (Ghana DPA). Four launch jurisdictions, four DPA-grounded statute YAML files, one composable framework.
Install
pip install arche-core # ~310KB base — pure-Python detectors, statute policy
pip install arche-core[all] # everything (GLiNER + Presidio + Splink + docling + LLM)
(Or uv add arche-core / uv add arche-core[all].) Heavy capabilities are opt-in extras:
| Extra | Adds |
|---|---|
arche-core[detect] |
GLiNER2-PII via ONNX runtime (multilingual neural soft-PII) |
arche-core[presidio] |
Microsoft Presidio recognizer plugin |
arche-core[resolve] |
Splink + DuckDB for large-scale entity resolution |
arche-core[doc] |
docling for PDF / DOCX / PPTX / XLSX ingestion |
Coverage
Per-launch-jurisdiction detection coverage. Every detector validates check-digits where the underlying spec supports it.
| Jurisdiction | Statute | Detectors |
|---|---|---|
| Nigeria (NG) | NDPA-2023 | NIN (11 digits), BVN (11 digits, 22-prefix), TIN, RC, voter PVC, driver's licence |
| Kenya (KE) | Kenya DPA 2019 | National ID, KRA PIN, NHIF |
| South Africa (ZA) | POPIA | SA ID (13-digit Luhn + DOB/gender/citizenship decode), tax reference, passport |
| Ghana (GH) | Ghana DPA 2012 | Ghana Card, SSNIT, TIN |
| + 11 more African patterns | — | Egypt, Uganda, Rwanda, Tanzania, Cameroon, Senegal, ... |
Plus libphonenumber-backed normalization for 30+ African phone networks, landmark-anchored address parsing for NG and ZA, and currency detection (Naira, Cedi, Rand, CFA).
The statute layer
Every detection emits a category, a sensitivity tier (high / moderate / low), and the specific statute section that classifies it. The Pipeline maps each to one of six closed actions — mask, tokenize, drop, generalize, audit, retain — per the configured jurisdiction's statute YAML.
for o in result.policy_outcomes:
print(o.category, o.action, o.statute_reference)
# PII-2-BVN mask NDPA-2023 s.30, CBN BVN policy 2014
# PII-2-NIN mask NDPA-2023 s.30, NIMC Act s.27
# PII-1-NAME tokenize NDPA-2023 s.30
Statute YAMLs live at arche/policy/_data/<STATUTE-ID>.yaml and are human-readable. Statute amendments are policy-file changes, not code changes.
Cultural naming intelligence
arche-core ships a 114-group African name equivalence lexicon covering 454 name forms across 50+ ethnic traditions:
- Mohammed = Muhammad = Mamadou = Muhammadu (Pan-Islamic)
- Diallo = Jallow = Jalloh (Fulani cross-ethnic orthography)
- Fatou = Fatoumata (West African diminutive)
- Adeyemi = Adeyẹmi = Adeyẹmí (Yoruba tonal marks)
- Pierre = Peter = Pedro (colonial-era cross-linguistic)
- Irorere, Aibuedfe (Benin/Edo names with semantic meaning)
Growing via Wikidata + community curation. See datasets/ for the full dataset and contribution guide.
Composing with Presidio, GLiNER, and Splink
arche-core is designed to compose with the incumbent tools, not replace them. The three integration patterns:
# Presidio's English recognizers + arche's African recognizers
pip install arche-core[presidio]
# arche.detect.presidio surfaces both as one recognizer set.
# GLiNER's multilingual NER + arche's statute classification
pip install arche-core[detect]
# Pipeline(jurisdiction="NG", backend="gliner") routes soft-PII through GLiNER.
# Splink's record linkage + arche's jurisdiction-aware comparators
pip install arche-core[resolve]
# Statute-tagged detections feed Splink as clean inputs.
Audit log
arche.graph.audit ships an SQLite-backed append-only log that records every detection, every policy decision, and every action taken — queryable by compliance officers and regulators. PII values are never stored; only categories, span offsets, and document hashes. Markdown compliance report generator for regulator-ready exports.
Power-user features
These ship in the package but are not in the headline pitch — they support specific identity workflows on top of the detection layer:
arche.sign— Ed25519 + JWS + did:key signing forPipeline.Resultenvelopes. SD-JWT-VC issue / verify viaarche.credentials.sd_jwt. Seeexamples/02_sign_share_extract.pyandexamples/04_sd_jwt_credential.py.arche.workflow.dsar— citizen-side DSAR draft generation with per-jurisdiction statute citations. Seeexamples/03_dsar_workflow.py.arche.resolve— lightweight Fellegi-Sunter matcher with jurisdiction-specific priors.from arche import matchfor two-record comparison;from arche import linkfor cross-source resolution.arche.workflow._review— MPI review queue for human-in-the-loop match decisions. Not on the public surface; import from the canonical path.arche.resolve_places/arche.list_places— jurisdictional place lookup with verifiable audit receipts.
These are real tools we depend on internally. They are not the lead pitch.
License
Apache 2.0. By Unpatterned Labs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arche_core-0.2.0a2.tar.gz.
File metadata
- Download URL: arche_core-0.2.0a2.tar.gz
- Upload date:
- Size: 319.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
337cde46f9c7a32495db5e3aed55bd488309d94bfdaf9bce8de6c8a73941efac
|
|
| MD5 |
c32eb468eb4da9f60b95c9561173c434
|
|
| BLAKE2b-256 |
84e275c723fe4d72586f4062c38e05b8653e12dcf402e279c88264799cc2d6d8
|
File details
Details for the file arche_core-0.2.0a2-py3-none-any.whl.
File metadata
- Download URL: arche_core-0.2.0a2-py3-none-any.whl
- Upload date:
- Size: 308.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e96bff6101dad50b921a19cc0bc9c3a9ba5ccbcd64851391cd124995785a9ab
|
|
| MD5 |
25ceec1e8648de6e4f5f1fd1b36a0640
|
|
| BLAKE2b-256 |
88500f873475f41a5784e44f8a0318f686bb7e8c7752575eb3516e850924ce35
|