Skip to main content

Pure Python library for deterministic identity canonicalization and matching in the rettX ecosystem

Project description

rettxidentity

Pure Python library for deterministic identity canonicalization and matching in the rettX ecosystem

Python Version License: MIT CI PyPI

Overview

rettxidentity is a reusable identity canonicalization and matching engine for the rettX ecosystem. It enables deterministic comparison of a caregiver-entered Draft Identity against a Verified Identity extracted from medical reports, producing an explicit Match Decision (PASS, BORDERLINE, FAIL) with confidence scores and explainable reason codes.

The library is intentionally designed with:

  • ✅ No database dependencies
  • ✅ No network calls
  • ✅ No secrets or configuration files
  • ✅ Deterministic outputs (same inputs → same outputs)
  • ✅ Thread-safe (all dataclasses frozen)
  • ✅ Cross-platform (Linux, Windows, macOS)

Key Features

  • Identity Comparison: Compare draft vs verified identities with explicit match decisions
  • Cross-Script Matching: Handle Greek, Georgian, and Cyrillic names seamlessly
  • Name Normalization: Unicode-aware name normalization with diacritics handling
  • Canonicalization: Versioned, deterministic identity representation
  • Explainability: Every decision includes reason codes explaining the logic
  • Performance: 1000+ comparisons/second on standard hardware

Installation

pip install rettxidentity

Quick Start

from rettxidentity import (
    compare_identities, 
    Identity, 
    PersonName, 
    MatchDecision,
    MutationStatus  # New in v0.2.0
)

# Draft identity (from caregiver input with clinical diagnosis)
draft = Identity(
    name=PersonName(given="Maria", surname="Garcia"),
    date_of_birth="1985-03-15",
    country_of_birth="ES",
    mutation_status=MutationStatus.CLINICAL_ONLY,  # Clinical diagnosis
    mutation_key=None
)

# Verified identity (from medical report with genetic test)
verified = Identity(
    name=PersonName(given="María", surname="García López"),
    date_of_birth="1985-03-15",
    country_of_birth="Spain",
    mutation_status=MutationStatus.CONFIRMED,  # Lab-confirmed
    mutation_key="MECP2"
)

# Compare them
result = compare_identities(draft, verified)

# Check the decision
if result.decision == MatchDecision.PASS:
    print("✓ Identities match! Safe to proceed.")
    print(f"Confidence: {result.confidence:.2%}")
    print(f"Reason codes: {[rc.value for rc in result.reason_codes]}")
elif result.decision == MatchDecision.BORDERLINE:
    print("⚠ Requires admin review")
    print(f"Reason codes: {[rc.value for rc in result.reason_codes]}")
else:  # FAIL
    print("✗ Identities do not match. Request correction.")
    print(f"Reason codes: {[rc.value for rc in result.reason_codes]}")

Mutation Status Classification (v0.2.0)

The library supports structured mutation classification:

# Lab-confirmed mutation with genetic coordinates
identity_genetic = Identity(
    name=PersonName(given="Sarah", surname="Thompson"),
    mutation_status=MutationStatus.CONFIRMED,
    mutation_key="NM_004992.3:c.808C>T"
)

# Clinical diagnosis without genetic testing
identity_clinical = Identity(
    name=PersonName(given="Emma", surname="Johnson"),
    mutation_status=MutationStatus.CLINICAL_ONLY,
    mutation_key=None  # No genetic coordinates
)

# No mutation information
identity_no_mutation = Identity(
    name=PersonName(given="John", surname="Doe"),
    mutation_status=MutationStatus.UNKNOWN,  # Default
    mutation_key=None
)

# Hard gate only applies when BOTH identities have CONFIRMED status
# Allows matching between CONFIRMED and CLINICAL_ONLY (identity locking policy)

Documentation

Development

Setup

# Clone repository
git clone https://github.com/rettx/rettxidentity.git
cd rettxidentity

# Install with dev dependencies
pip install -e ".[dev]"

Running Tests

# Run all tests with coverage (minimum 80% required)
pytest

# Run specific test categories
pytest -m unit              # Unit tests only
pytest -m contract          # Contract tests only
pytest -m integration       # Integration tests only

# Generate detailed coverage report
pytest --cov-report=html

Code Quality

# Format code
ruff format src tests

# Lint with auto-fix
ruff check --fix src tests

# Type check
mypy src/rettxidentity --strict

CI/CD

The project uses GitHub Actions for continuous integration and deployment:

  • CI Workflow: Runs on every push and PR (linting, type checking, tests)
  • Publish Workflow: Automatically publishes to PyPI when a release is created

See .github/workflows/README.md for detailed CI/CD documentation.

Design Principles

This library follows the rettX Identity Constitution:

  1. Identity ≠ Identifier - Never generates rettxid
  2. Verified Identity Only - Canonical output from verified identity only
  3. Determinism - Same inputs → same outputs (versioned)
  4. Explainability - Structured reason codes in all outputs
  5. Script-Agnostic - Native scripts first-class; transliteration for comparison
  6. DOB + Mutation Anchors - Hard gates with clear rules
  7. No Lock - Declares eligibility, never locks
  8. Borderline First-Class - Preferred over FAIL in ambiguity
  9. Purity & Statelessness - No DB, network, filesystem
  10. Minimal Surface Area - Focused API
  11. Versioned Canonicalization - All outputs include version
  12. Privacy by Construction - No logging by default

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please read our contributing guidelines and code of conduct.

Support

For questions and support, please open an issue on GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rettxidentity-0.2.0.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rettxidentity-0.2.0-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file rettxidentity-0.2.0.tar.gz.

File metadata

  • Download URL: rettxidentity-0.2.0.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rettxidentity-0.2.0.tar.gz
Algorithm Hash digest
SHA256 cdb7ed107a010530b22ed48d8cd0c3bf550eb9f3c843f3ed92e5b2f071941cbf
MD5 a42c47c9451179849d549ee978a26f76
BLAKE2b-256 bb8d1df52bd487f60a1213e0e42d59fd17de7d2a877c1dbdf8ad14464fb615b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for rettxidentity-0.2.0.tar.gz:

Publisher: publish.yml on rett-europe/rettxidentity

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rettxidentity-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: rettxidentity-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rettxidentity-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4484ab372f17783fd2d9b1199edfd655a26b553a084b64b29b759627d1578445
MD5 15df99f9b0b0c6f513a88b40cd79d883
BLAKE2b-256 ec0129922854a30c14e5a27fc2c09709809bab3ef47294a7f792c3031fefd9b4

See more details on using hashes here.

Provenance

The following attestation bundles were made for rettxidentity-0.2.0-py3-none-any.whl:

Publisher: publish.yml on rett-europe/rettxidentity

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page