Skip to main content

The identity workflow framework. African-first, globally pluggable. Compose detection, resolution, linking, verification, and governance into production identity pipelines. By unpatterned.org.

Project description

arche-core

African PII detection that cites the law it enforces.

arche-core detects PII for African jurisdictions; government IDs, names, phone numbers, addresses, and grounds every detection in the data protection statute that governs it. NDPA, POPIA, Kenya DPA, Ghana DPA. Six closed policy actions. Composes with Presidio, GLiNER, and Splink.

Presidio detects PII. GLiNER does multilingual NER. Splink links records. None of them know that a BVN is sensitive under NDPA §30, or that "Adeyẹmí" and "Adeyemi" are the same Yoruba name with and without tonal marks, or that "behind Total filling station, Madina Junction" is a parseable Ghanaian address. arche-core does that one job.

from arche import Pipeline

pipeline = Pipeline(jurisdiction="NG")        # auto-loads NDPA-2023
result = pipeline.process(
    "Fatima Abdullahi, NIN 12345678901, BVN 22100987654."
)

for d in result.detections:
    print(f"{d.category:11} tier={d.sensitivity_tier.value:9} {d.regulatory_citation}")
# PII-2-BVN   tier=high      NDPA-2023 s.30, CBN BVN policy 2014
# PII-2-NIN   tier=high      NDPA-2023 s.30, NIMC Act s.27
# PII-1-NAME  tier=moderate  NDPA-2023 s.30            (×2 — given + family name)

print(result.redacted_text)
# NAME_... NAME_..., NIN [NIN], BVN [BVN].

Same code works for jurisdiction="ZA" (POPIA), "KE" (Kenya DPA), "GH" (Ghana DPA). Four launch jurisdictions, four DPA-grounded statute YAML files, one composable framework.

Install

pip install arche-core          # ~310KB base — pure-Python detectors, statute policy
pip install arche-core[all]     # everything (GLiNER + Presidio + Splink + docling + LLM)

(Or uv add arche-core / uv add arche-core[all].) Heavy capabilities are opt-in extras:

Extra Adds
arche-core[detect] GLiNER2-PII via ONNX runtime (multilingual neural soft-PII)
arche-core[presidio] Microsoft Presidio recognizer plugin
arche-core[resolve] Splink + DuckDB for large-scale entity resolution
arche-core[doc] docling for PDF / DOCX / PPTX / XLSX ingestion

Coverage

Per-launch-jurisdiction detection coverage. Every detector validates check-digits where the underlying spec supports it.

Jurisdiction Statute Detectors
Nigeria (NG) NDPA-2023 NIN (11 digits), BVN (11 digits, 22-prefix), TIN, RC, voter PVC, driver's licence
Kenya (KE) Kenya DPA 2019 National ID, KRA PIN, NHIF
South Africa (ZA) POPIA SA ID (13-digit Luhn + DOB/gender/citizenship decode), tax reference, passport
Ghana (GH) Ghana DPA 2012 Ghana Card, SSNIT, TIN
+ 11 more African patterns Egypt, Uganda, Rwanda, Tanzania, Cameroon, Senegal, ...

Plus libphonenumber-backed normalization for 30+ African phone networks, landmark-anchored address parsing for NG and ZA, and currency detection (Naira, Cedi, Rand, CFA).

The statute layer

Every detection emits a category, a sensitivity tier (high / moderate / low), and the specific statute section that classifies it. The Pipeline maps each to one of six closed actions — mask, tokenize, drop, generalize, audit, retain — per the configured jurisdiction's statute YAML.

for o in result.policy_outcomes:
    print(o.category, o.action, o.statute_reference)
# PII-2-BVN    mask       NDPA-2023 s.30, CBN BVN policy 2014
# PII-2-NIN    mask       NDPA-2023 s.30, NIMC Act s.27
# PII-1-NAME   tokenize   NDPA-2023 s.30

Statute YAMLs live at arche/policy/_data/<STATUTE-ID>.yaml and are human-readable. Statute amendments are policy-file changes, not code changes.

Cultural naming intelligence

arche-core ships a 114-group African name equivalence lexicon covering 454 name forms across 50+ ethnic traditions:

  • Mohammed = Muhammad = Mamadou = Muhammadu (Pan-Islamic)
  • Diallo = Jallow = Jalloh (Fulani cross-ethnic orthography)
  • Fatou = Fatoumata (West African diminutive)
  • Adeyemi = Adeyẹmi = Adeyẹmí (Yoruba tonal marks)
  • Pierre = Peter = Pedro (colonial-era cross-linguistic)
  • Irorere, Aibuedfe (Benin/Edo names with semantic meaning)

Growing via Wikidata + community curation. See datasets/ for the full dataset and contribution guide.

Composing with Presidio, GLiNER, and Splink

arche-core is designed to compose with the incumbent tools, not replace them. The three integration patterns:

# Presidio's English recognizers + arche's African recognizers
pip install arche-core[presidio]
# arche.detect.presidio surfaces both as one recognizer set.

# GLiNER's multilingual NER + arche's statute classification
pip install arche-core[detect]
# Pipeline(jurisdiction="NG", backend="gliner") routes soft-PII through GLiNER.

# Splink's record linkage + arche's jurisdiction-aware comparators
pip install arche-core[resolve]
# Statute-tagged detections feed Splink as clean inputs.

Audit log

arche.graph.audit ships an SQLite-backed append-only log that records every detection, every policy decision, and every action taken — queryable by compliance officers and regulators. PII values are never stored; only categories, span offsets, and document hashes. Markdown compliance report generator for regulator-ready exports.

Power-user features

These ship in the package but are not in the headline pitch — they support specific identity workflows on top of the detection layer:

  • arche.sign — Ed25519 + JWS + did:key signing for Pipeline.Result envelopes. SD-JWT-VC issue / verify via arche.credentials.sd_jwt. See examples/02_sign_share_extract.py and examples/04_sd_jwt_credential.py.
  • arche.workflow.dsar — citizen-side DSAR draft generation with per-jurisdiction statute citations. See examples/03_dsar_workflow.py.
  • arche.resolve — lightweight Fellegi-Sunter matcher with jurisdiction-specific priors. from arche import match for two-record comparison; from arche import link for cross-source resolution.
  • arche.workflow._review — MPI review queue for human-in-the-loop match decisions. Not on the public surface; import from the canonical path.
  • arche.resolve_places / arche.list_places — jurisdictional place lookup with verifiable audit receipts.

These are real tools we depend on internally. They are not the lead pitch.

License

Apache 2.0. By Unpatterned Labs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arche_core-0.2.0a2.tar.gz (319.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arche_core-0.2.0a2-py3-none-any.whl (308.6 kB view details)

Uploaded Python 3

File details

Details for the file arche_core-0.2.0a2.tar.gz.

File metadata

  • Download URL: arche_core-0.2.0a2.tar.gz
  • Upload date:
  • Size: 319.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for arche_core-0.2.0a2.tar.gz
Algorithm Hash digest
SHA256 337cde46f9c7a32495db5e3aed55bd488309d94bfdaf9bce8de6c8a73941efac
MD5 c32eb468eb4da9f60b95c9561173c434
BLAKE2b-256 84e275c723fe4d72586f4062c38e05b8653e12dcf402e279c88264799cc2d6d8

See more details on using hashes here.

File details

Details for the file arche_core-0.2.0a2-py3-none-any.whl.

File metadata

  • Download URL: arche_core-0.2.0a2-py3-none-any.whl
  • Upload date:
  • Size: 308.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for arche_core-0.2.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 1e96bff6101dad50b921a19cc0bc9c3a9ba5ccbcd64851391cd124995785a9ab
MD5 25ceec1e8648de6e4f5f1fd1b36a0640
BLAKE2b-256 88500f873475f41a5784e44f8a0318f686bb7e8c7752575eb3516e850924ce35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page