Skip to main content

MCard: Local-first Content Addressable Storage with Content Type Detection

Project description

MCard: Local-First Content Addressable Storage

Python Version License: MIT Code style: ruff CI/CD Pipeline Quality Gate

MCard is a powerful Python library implementing an algebraically closed data structure for content-addressable storage. It provides a robust system where every piece of content is uniquely identified by its cryptographic hash and temporally ordered, enabling content verification, deduplication, and versioning.

The system features a modular architecture with support for multiple content types and a flexible database backend (SQLite).

Executive Summary

  • Implements content-addressable storage with guaranteed integrity and temporal ordering
  • Built for performance: binary-first storage with smart text views when needed
  • Developer-friendly: rich Python API, examples, tests, and BMAD-driven TDD workflow
  • Enterprise-grade: comprehensive logging, CI/CD pipeline, security auditing, and quality gates
  • Production-ready: 99.4% test success rate, zero breaking changes, modern tooling (ruff, uv)
  • Quick start and test in minutes with uv and provided scripts

🔮 Future Vision: Towards Verifiable Execution with PTR and Audited Collections

MCard is evolving beyond content-addressable storage to become the foundational layer for a Polynomial Type Runtime (PTR) and Audited MCard Collections. This strategic direction integrates advanced concepts from the Cubical Logic Model (CLM) and Purely Functional Software Deployment (PFSD) to deliver a system with mathematically verifiable execution, immutable audit trails, and content-driven programming capabilities.

Key Capabilities in Development:

  • PTR-Verified Execution: All critical operations will be mediated by PTR, ensuring that only formally verified PCards (programs defined by CLM's Abstract, Concrete, and Balanced dimensions) are executed. This guarantees correctness and prevents unverified code from running. The runtime may evolve from Python to formal verification systems like LEAN for mathematical proof of correctness (see PRD Section 12.4).
  • Immutable Audit Fabric: Every significant system event, policy decision, and execution trace will be captured as an Audited MCard. These collections form a tamper-evident, content-addressed evidence ledger, enabling provable accountability and compliance.
  • Unified Observability: PTR and audit pipelines will emit OpenTelemetry (OTel)-compatible traces, metrics, and logs, enabling vendor-neutral integration with existing observability stacks (Prometheus, Jaeger, Grafana, commercial APMs) as described in the PRD's observability requirements.
  • Content-Driven Programming: Leveraging CLM, programs themselves will be represented as graphs of hash-addressed MCards and PCards, moving away from traditional file-based code towards a verifiable, compositional programming model.
  • Polyglot Execution: The PTR now supports multi-language execution (Python, JavaScript, Rust, C, Wasm), allowing developers to write PCards in their preferred language while maintaining a unified verification and execution interface.
  • Evolutionary Architecture: MCard now implements an evolutionary model for PCards, distinguishing between persistent PCard Identities (upgradable contracts) and immutable PCard Snapshots (versioned logic), enabling safe upgrades and forking of logic models.
  • Reproducible Deployment: Adhering to PFSD principles, the entire software lifecycle (build, verify, deploy) will be driven by content-addressed artifacts and pure functions, ensuring full reproducibility and auditability of all system states.

High-Level Observability Architecture

graph LR
  PTR[PTR Runtime] --> OTel[OpenTelemetry Collector]
  OTel --> Prom[Prometheus / Metrics]
  OTel --> Jaeger[Jaeger / Traces]
  OTel --> Graf[Grafana / APM / Dashboards]

This evolution transforms MCard into a self-hosting execution kernel, where correctness, security, and auditability are mathematical properties of the system, not just operational conventions. For detailed product requirements, see the MCard PRD.

🌌 The Prologue of Spacetime: A Monadic Narrative

We are currently operationalizing the Meta-Narrative Framework through the Prologue of Spacetime, a twelve-chapter execution plan that builds the universe of the project using the Cubical Logic Model (CLM).

The Monadic Template

To support this, we have introduced the Narrative Monad (mcard/ptr/core/clm_template.py), a functional programming template that composes:

  • Reader Monad: Carries the Cultural Context (Tri Hita Karana) and Configuration.
  • State Monad: Carries the evolving World State (Village Prosperity, Network Topology).
  • Writer Monad: Accumulates the Log of the journey (The Story Text, The Audit Trail).
  • IO Monad: Handles the Effects at the boundaries (User Interaction, System Deployment).

This template allows us to write each chapter as a pure function: Chapter :: Context -> State -> (Result, NewState, Log).

Chapter 0 (Prologue): The Value of Counting

The prologue chapter, "The Value of Counting," establishes the MVP Card: The Counter. It uses the Water Clock HyperCard prototype to demonstrate how the act of observation (IO) creates discrete identity (State) and fairness (Writer).

  • Specification: chapters/chapter_00_prologue/mvp_counter.yaml
  • Prototype: chapters/chapter_00_prologue/water_clock.py

Chapter 1: Resource-Aware Computation (Arithmetic)

This chapter demonstrates polyglot consensus across multiple runtimes (Python, JavaScript, Rust, C, WebAssembly, Lean). It implements the Cubical Logic Model through arithmetic operations, proving that truth is invariant across representations.

  • CLM Specifications: chapters/chapter_01_arithmetic/*.yaml
  • Runtimes: Python, JavaScript, Rust, C, WASM, Lean 4

Chapter 2: Content Addressing (Handle)

This chapter introduces the Content Handle system, enabling human-readable names to reference content-addressed MCards. It demonstrates the duality between immutable hash-based retrieval and mutable handle-based resolution.

  • CLM Specifications: chapters/chapter_02_handle/*.clm
  • Test Data: chapters/chapter_02_handle/test_data/*.yaml
  • Key Features:
    • UTF-8 Handles: International characters supported (文檔, مستند, ドキュメント, документ)
    • Handle validation uses Unicode categories (letters, digits, _, -)
    • NFC normalization + casefold for consistent lookup
    • Version history tracking
    • Dual retrieval (hash vs handle)
  • Monadic API: get_by_handle_m(), resolve_handle_m() return Maybe monad for functional composition

Chapter 3: LLM Integration

This chapter introduces the LLM Runtime, enabling Large Language Model execution as a first-class CLM runtime. It demonstrates monadic LLM interactions with local models via Ollama.

  • CLM Specifications: chapters/chapter_03_llm/*.clm
  • Providers: Ollama (default), LMStudio, OpenAI (extensible)
  • Default Models: gemma3:latest, llama3:latest, qwen3:latest
  • Key Features:
    • Monadic Interface: prompt_monad(), chat_monad() return IO[Either] for composition
    • System Prompts: Full chat completion support with system/user/assistant roles
    • Structured Output: JSON extraction and entity recognition
    • File Summarization: Summarize .md and .py files with configurable styles
  • Demo Scripts:
    • scripts/demo_llm_runtime.py - Interactive demos
    • chapters/chapter_03_llm/file_summarizer_logic.py - File summarization

Chapter 4: High-Performance Data Loading

This chapter enables bulk data ingestion and performance benchmarking, dealing with both text and binary data (images, audio). It demonstrates recursive loading and mixed-content handling.

  • CLM Specifications: chapters/chapter_04_load_dir/*.clm
    • binary_loader.clm: Benchmarks loading binary datasets (Images, Audio).
    • recursive_options.clm: Demonstrates control over recursive vs. flat directory scanning.
    • hub_loader.clm: Benchmarks loading text datasets.
  • Logic Implementation: chapters/chapter_04_load_dir/loader_logic.py
    • Features: Recursive scanning, binary detection, vector embedding skipping for non-text.
    • Metrics: Throughput (Files/sec, MB/sec), Retrieval Latency.

Chapter 5: Reflection & Metacognition

This chapter introduces Systemic Self-Awareness. It uses CLMs to analyze other CLMs, creating a closed loop where the system understands its own structure.

  • CLM Specifications: chapters/chapter_05_reflection/*.clm
    • clm_inventory.clm: Scans and catalogs all 40+ CLMs in the project.
    • runtime_audit.clm: Analyzes the distribution of polyglot runtimes (Python, Lean, Rust, etc.).
    • Narrative Weaver: narrative_weaver.clm reconstructs the "Prologue of Spacetime" story from disjoint chapters.
  • Concept: The system can now "read" itself, treating its own code as data (MCard), fulfilling the monadic vision of reflection.

Advanced Capabilities: GraphRAG & Multimodal Perception

Beyond the core CLM chapters, MCard includes advanced prototypes for Graph Retrieval-Augmented Generation (GraphRAG) and Multimodal Perception. These features transform MCard into a system that can "see" images and "reason" about relationships.

  • Logic Implementation: mcard/rag/graph/ (Entities, Relationships, Community Detection)
  • GraphRAG Engine:
    • Entity Extraction: Automatically extracts entities and relationships from text using local LLMs.
    • Knowledge Graph: Stores entities and relationships in SQLite with graph traversal capabilities.
    • Community Detection: Implements Label Propagation Algorithm (LPA) to detect clusters.
    • Hybrid Search: Combines sqlite-vec vector similarity with graph traversal.
  • Multimodal Vision:
    • Vision Embeddings: Integrates llama3.2-vision to "see" images.
    • Describe-then-Embed: Generates rich text descriptions of images for cross-modal search.
  • Runnable Demos:
    • scripts/demo_graphrag.py: Full GraphRAG pipeline (Extraction -> Graph -> Query).
    • scripts/demo_graph_communities.py: Community detection and summarization.
    • scripts/demo_vision_rag.py: Multimodal vision embedding and search.

☯️ Architectural Philosophy: The Monad-Polynomial Duality

MCard's design is grounded in the complementary relationship between Monadic Control and Polynomial Data. This duality allows us to build a system that is both rigorously safe and infinitely flexible.

1. Monads: The Invariant Container

We employ Monadic design patterns (IO, Reader, Writer, State) to establish the invariant laws of execution. Monads manage the contexthow computation happens. They handle:

  • Purity & Safety: Encapsulating side effects (IO) and error handling (Either).
  • Observability: Accumulating audit logs and traces (Writer).
  • Context: Passing configuration and security policies (Reader).

2. Polynomials: The Variant Content

Complementing this, Polynomial Functors ($P(X) = \sum A \times X^B$) inject variability into the system. They represent the contentwhat is being computed.

  • Structure: The PCard is a reified polynomial, defining a sum of choices (operations) and products of data (inputs).
  • Flexibility: By treating logic as data (Polynomials), we can swap implementations (Polyglot Runtimes) without changing the execution container.

The Synthesis

The PTR (Polynomial Type Runtime) acts as the bridge. It interprets the Polynomial (the variable logic of a PCard) using the Monad (the invariant rules of the Runtime). This ensures that while the logic can be infinite and varied, the execution remains safe, auditable, and formally verifiable.

Table of Contents

📦 Data Model

MCard is built around a simple but powerful data model:

  • Card: The fundamental unit of content with a unique hash
  • Hash: Cryptographic identifier for content (SHA-256 by default)
  • Content: Optimized BLOB storage
    • Binary format ensures maximum performance and exact content preservation
    • Efficient storage for both text and binary data
    • MCard's browsing interface provides human-readable views when needed
  • G-Time: Global time value for temporal ordering of content claims
  • Temporal Ordering: Built-in support for temporal ordering of content claims
  • Modular Architecture: Extensible design with pluggable components
  • Type Hints: Built with Python type hints

✨ Features

  • Content-Addressable Storage: Store and retrieve content using cryptographic hashes (SHA-256 by default)
  • Optimized Storage: BLOB format ensures maximum performance while MCard handles all text conversions
  • Content Type Detection: Automatic detection of various file formats (JSON, XML, CSV, Markdown, Python, etc.)
  • Robust Binary Signatures: Accurate detection of PNG, JPEG, GIF, PDF, ZIP/OpenXML, and RIFF (WAV/AVI) using raw-byte signatures (no lossy text preprocessing)
  • Smarter YAML Heuristics: Reduced false positives (e.g., Python dict strings are no longer misclassified as YAML)
  • Temporal Ordering: Built-in support for temporal ordering of content claims
  • Modular Architecture: Extensible design with pluggable components
  • Type Safety: Built with Python type hints and Pydantic models
  • Async Support: Asynchronous API for improved performance

🚀 Getting Started

Database Inspection

MCard uses BLOB storage for optimal performance and data integrity. The binary format allows for efficient storage and retrieval while MCard handles all necessary text conversions. To inspect the database:

# Open the database in SQLite CLI
sqlite3 mcard.db

# View the schema
.schema

# View binary content (first 20 bytes as hex)
SELECT hash, hex(substr(content, 1, 20)) as preview, g_time FROM card LIMIT 5;

# MCard's API provides easy access to content in various formats:
# - get_content() - Returns raw bytes for maximum performance
# - get_content(as_text=True) - Returns decoded text when needed
# - to_dict() - Automatically converts content to appropriate formats

🚀 Developers Start Here: Using MCard in an Internal Service

If you are building an internal service and want to use MCard as your content‑addressable storage, this is the recommended starting pattern.

1. Install and set up the environment

git clone [https://github.com/xlp0/MCard_TDD.git](https://github.com/xlp0/MCard_TDD.git)
cd MCard_TDD
./activate_venv.sh

## Zero Trust AuthN/Z

We adopt a Zero Trust Architecture (ZTA)  "Never Trust, Always Check"  for admitting content identities into MCard collections. All network-facing operations evaluate identity, policy and context continuously, not once.

Design goals:
- Continuous verification before, during, and after content admission
- Policy-as-code for authorization decisions
- Deterministic, testable contracts using Pocketflow’s prep  exec  post specification

Pocketflow-style contract for content admission:

- prep (preconditions)
  - Caller identity established via network-aware auth (e.g., mTLS, OIDC, DiD/JWT)
  - Policy context resolved (tenant, collection, role, risk, device posture, IP/geofence)
  - Content metadata validated (size, type, provenance hints)
  - Rate limit/quota check passes

- exec (action)
  - Compute content hash deterministically (e.g., SHA-256)
  - Run validation pipeline; branch per MIME (binary/text validators)
  - Consult authorization engine with (subject, action=admit, resource=hash, context)
  - Persist only if decision = allow and validation = pass

- post (postconditions)
  - Emit audit event with decision rationale, policy version, and evaluator inputs
  - Update temporal index (g_time) and secondary indices (FTS) if applicable
  - Return VCard admission receipt (hash, g_time, policy_decision, signatures)

Currying relation across cards:
- MCard: base content-addressable identity (content, hash, g_time) - the functorial fixed point
- PCard: polynomial representation/interface protocol for MCard - effectively the same as MCard but with computational interpretation $F(X) = \sum_i (A_i \times X^{B_i})$
- VCard: curried specialization of MCard/PCard + (verification rules, validation rules, security checks)

In effect, PCard provides the polynomial functional interface to MCard's fixed point substrate, and VCard completes the currying by applying security/verification rules to the MCard/PCard foundation. See [MVP Cards for PKC](docs/MVP%20Cards%20for%20PKC.md) and [PCard Architecture](docs/PCard%20Architecture.md) for detailed mathematical foundations.

Testing notes:
- Model prep/exec/post as explicit test phases; use property tests for invariants (idempotent hash; monotonic g_time)
- Include negative suites (policy deny, malformed identity, validator fail)

### Prerequisites

- Python 3.9 or higher
- [uv](https://github.com/astral-sh/uv) - A fast Python package installer and resolver

### Basic Installation

1. Clone the repository:
   ```bash
   git clone https://github.com/xlp0/MCard_TDD.git
   cd MCard_TDD
  1. Set up the Python environment using the provided script:
    ./activate_venv.sh
    
    This will:

🏃 Executing Logic (CLM)

The primary way to run logic in this system is via the Polynomial Type Runtime (PTR) CLI. This unifies loading, assembly, and polyglot execution.

# Run a specific CLM Chapter
uv run python -m mcard.ptr.cli run chapters/chapter_01_arithmetic/advanced_comparison.yaml

See mcard/ptr/README.md for full documentation on the runtime engine and CLMRunner API. 3. For development, install additional development dependencies:

uv pip install -e ".[dev]"

Optional Dependencies

MCard supports optional features that can be installed as extras:

uv pip install -e ".[xml]"

🌐 Polyglot Runtime Support

MCard's Polynomial Type Runtime (PTR) supports polyglot execution, enabling you to write PCards in multiple programming languages while maintaining a unified verification and execution interface. Each runtime provides different strengths for specific use cases.

Why Multiple Runtimes?

The polyglot architecture enables:

  • Language-specific optimization: Use the best tool for each task (Rust for performance, Python for rapid prototyping, Lean for formal verification)
  • Formal verification: Lean 4 provides mathematical proof of correctness for critical operations
  • Cross-platform deployment: WASM enables browser and edge deployment
  • Legacy integration: C and JavaScript support existing codebases
  • Type safety: Multiple type systems provide defense in depth

Required Runtime Installations

To run all polyglot tests and examples, you'll need to install the following runtimes:

1. Python (Required)

Purpose: Core runtime, rapid prototyping, API implementation

Installation: Already required (Python 3.9+)

python3 --version  # Should be 3.9 or higher

2. JavaScript/Node.js (Optional but Recommended)

Purpose: JavaScript PCard execution, frontend integration, WASM tooling

Installation:

# macOS (using Homebrew)
brew install node

# Ubuntu/Debian
sudo apt install nodejs npm

# Verify installation
node --version  # Should be v14+ or higher
npm --version

Why needed: Executes JavaScript-based PCards, enables browser-side verification, and supports the WASM compilation toolchain.

3. Rust (Optional but Recommended)

Purpose: High-performance PCard execution, WASM compilation, systems programming

Installation:

# Install Rust using rustup (all platforms)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Add WASM target for WebAssembly compilation
rustup target add wasm32-wasi

# Verify installation
rustc --version
cargo --version

Why needed: Executes Rust-based PCards with near-native performance, compiles to WASM for universal deployment, and provides memory safety guarantees.

4. WASM Runtime (wasmtime) (Optional)

Purpose: Execute WebAssembly modules, universal deployment target

Installation:

# macOS (using Homebrew)
brew install wasmtime

# Linux
curl https://wasmtime.dev/install.sh -sSf | bash

# Or install Python bindings
uv pip install wasmtime

# Verify installation
wasmtime --version

Why needed: Runs compiled WASM modules from Rust/C, enables sandboxed execution, and provides a universal runtime for edge deployment.

5. Lean 4 (Optional but Important)

Purpose: Formal verification, mathematical proof of correctness, theorem proving

Installation:

# Install elan (Lean version manager)
curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh

# Install Lean 4
elan install leanprover/lean4:stable

# Verify installation
lean --version  # Should show Lean 4.x

Why needed: Provides mathematically verified computation for critical operations. Lean 4's type system ensures correctness by construction, making it ideal for security-critical and financial applications.

6. C Compiler (gcc/clang) (Optional)

Purpose: Low-level systems programming, legacy integration, bare-metal execution

Installation:

# macOS (Xcode Command Line Tools)
xcode-select --install

# Ubuntu/Debian
sudo apt install build-essential

# Verify installation
gcc --version
# or
clang --version

Why needed: Compiles C-based PCards for maximum performance and minimal runtime overhead, integrates with existing C libraries, and enables bare-metal deployment.

🚀 Using the PTR CLI

MCard includes a powerful CLI tool ptr for managing and executing PCards.

1. Check Runtime Status

uv run ./ptr --status

Output:

=== Polyglot Runtime Status ===
✓ Python 3.9.22
✓ Javascript
✓ Rust
✓ C
✓ Wasm
✓ Lean
===============================
Available: 6/6 runtimes

2. List Available PCards

uv run ./ptr --list

3. Execute a PCard

Run a PCard by file path or hash:

# Run runtime check PCard
uv run ./ptr runtime_status_check.clm

# Run arithmetic PCard with input
uv run ./ptr chapters/samples/python_arithmetic.clm 21
# Output: 42.0

# Run by hash (if previously loaded)
uv run ./ptr acdec653... 21

4. Programmatic API (Advanced)

For integration into your own applications:

from mcard.ptr.core.runtime import RuntimeFactory


# Check if system can execute PCards
if not RuntimeFactory.at_least_one_available():
    raise RuntimeError("No runtimes available - cannot execute PCards!")

# Get detailed status for all runtimes
status = RuntimeFactory.get_detailed_status()
print(f"Python version: {status['python']['version']}")

5. CI/CD Validation

uv run pytest tests/ptr/test_runtime.py::TestRuntimeFactory::test_list_available_runtimes -v

Minimal vs Full Installation

Minimal (Python only): Run the core MCard system and Python-based PCards

# Just activate the virtual environment
./activate_venv.sh

Full Polyglot (All runtimes): Run all PCard types and formal verification

# Install all runtimes as described above
./activate_venv.sh
# Then install: Node.js, Rust, WASM, Lean 4, C compiler

🔧 Recent Bug Fixes and Improvements

Lean 4 Float Parsing Fix (December 2025)

Problem: Lean 4's standard library doesn't provide a String.toFloat? function, causing compilation errors in arithmetic PCards that needed to parse floating-point numbers from JSON strings.

Solution: Implemented a custom parseFloat function in chapters/arithmetic/logic_advanced.lean that:

  • Parses integers and converts them to floats
  • Handles decimal numbers by splitting on '.' and computing fractional parts
  • Supports scientific notation (e.g., 1.5e-10) by parsing mantissa and exponent separately
  • Gracefully handles negative numbers and invalid input (returns 0.0 as default)

Files Modified:

  • chapters/arithmetic/logic_advanced.lean: Added custom float parser with scientific notation support

Impact: All Lean-based arithmetic tests now pass, enabling formal verification of floating-point operations.

Test Suite Fixes (December 2025)

1. YAMLTemplateLoader Parameter Fix

  • Problem: Test was using incorrect parameter name (template_dir instead of templates_dir) and calling non-existent method get_template()
  • Solution: Updated tests to use correct templates_dir parameter and load_template() method
  • Files: tests/ptr/test_clm.py

2. RustRuntime Environment Validation Fix

  • Problem: Mock didn't prevent wasmtime import, causing validation test to return True instead of expected False
  • Solution: Added patch.dict('sys.modules', {'wasmtime': None}) to properly mock the import
  • Files: tests/ptr/test_runtime.py

3. RustRuntime WASM Execution Test Fix

  • Problem: Test expected "not yet implemented" but actual implementation returns file error
  • Solution: Updated assertion to match actual error message format
  • Files: tests/ptr/test_runtime.py

Test Results: All 388 tests now pass (1 skipped) ✅

Lean 4 Polyglot Runtime Fixes (December 2025)

1. Boolean Comparison Case Sensitivity

  • Problem: Lean 4 outputs lowercase true/false, but Python YAML parser loads booleans as True/False. Comparison str("true") != str(True) caused all boolean test cases to fail.
  • Solution: Added case-insensitive boolean comparison in CLMChapterLoader._compare_results() that normalizes both values to lowercase before comparing.
  • Files: mcard/ptr/clm/loader.py
  • Impact: All 26 standalone Lean CLM test cases now pass (primality, propositional logic, etc.)

2. Lean Polyglot Context Passing

  • Problem: In polyglot consensus tests, LeanRuntime.execute() was passing target.get_content() (which was "dummy") instead of the actual operation context containing op, a, b values.
  • Solution: Updated LeanRuntime.execute() to detect polyglot mode (when context has op, a, n keys) and pass the context as JSON, while still supporting standalone CLM mode that uses target content.
  • Files: mcard/ptr/core/runtime.py
  • Impact: All 8 runtimes (Python, JavaScript, Rust, C, WASM, Lean, R, Julia) now achieve consensus in polyglot tests.

Test Results: All 461 tests now pass (9 skipped) ✅

Code Quality Improvements

  • Zero lint errors across all Lean, Python, and test files
  • 100% polyglot test coverage for Python, JavaScript, Rust, C, WASM, Lean, R, and Julia runtimes
  • Improved error handling in runtime validation and WASM execution
  • Better test mocking for platform-dependent runtime checks

🧭 BMAD Method: How We Work

We use BMAD to drive a tight Test-Driven Development loop for MCard.

  • RED: write a failing test that captures the next smallest behavior
  • GREEN: implement the minimal code to pass only that test
  • REFACTOR: improve design while keeping tests green

BMAD helper script and config in this repo:

  • bmad_workflow.py – CLI to guide the RED/GREEN/REFACTOR loop
  • bmad_config.yaml – test categories, coverage goals, environment
  • BMAD_GUIDE.md – step-by-step usage and tips

Quick usage:

# Start a new TDD cycle for a behavior
./bmad_workflow.py start "create card from bytes"

# After writing the failing test
./bmad_workflow.py mark-written

# After making the test pass
./bmad_workflow.py mark-passing

# After refactoring
./bmad_workflow.py complete-refactor

# View current status
./bmad_workflow.py status

🏗️ Project Structure

MCard_TDD/ ├── mcard/ # Core Python package │ ├── cli.py # CLI implementation │ ├── config/ # Configuration management │ ├── engine/ # Database engine implementations │ │ ├── base.py # Base engine interface │ │ └── sqlite_engine.py # SQLite implementation │ ├── model/ # Data models and content handling │ │ ├── card.py # Core MCard implementation │ │ ├── card_collection.py # Collections of MCards │ │ ├── ptr/ # Polynomial Type Runtime (PTR) │ │ │ ├── core/ # Core PTR engine │ │ │ ├── clm/ # CLM framework │ │ │ └── mcard_integration/# MCard integration │ │ ├── detectors/ # Content type detectors │ │ └── hash/ # Hashing implementations │ │ └── algorithms/ # Hash algorithm implementations │ └── ptr/ # PTR package (alias/wrapper) ├── data/ # Data storage directories │ ├── db/ # Database files │ ├── loaded_content/ # Processed content storage │ └── test_content/ # Test content files ├── docs/ # Documentation │ ├── reports/ # Generated reports and summaries │ └── to-do-plan/ # Project planning documents ├── examples/ # Example scripts │ └── demos/ # Demo scripts ├── scripts/ # Utility scripts and CLI wrappers ├── tests/ # Test suite │ ├── data/ # Test data │ └── test_data/ # Additional test data ├── pyproject.toml # Project configuration └── README.md # This file

🚦 Quick Start

Using the Python API (synchronous)

from mcard import MCard, default_collection

# Create a new card (text)
card = MCard("Hello, MCard!")
hash_value = default_collection.add(card)

# Retrieve the card by hash
retrieved = default_collection.get(hash_value)
print(retrieved.get_content().decode("utf-8"))  # Hello, MCard!

# Search for cards containing a substring
results = default_collection.search_by_string("Hello")
for c in results.items:
    try:
        print(c.get_content().decode("utf-8"))
    except UnicodeDecodeError:
        print(f"[binary] {c.hash}")

🧪 Running Tests

Run the test suite (with uv):

uv run pytest -q

For test coverage report:

uv run pytest --cov=mcard --cov-report=term-missing

Verifying CLM Specifications

To run the full suite of Cubical Logic Models (CLMs) across all chapters:

uv run python scripts/verify_all_clms.py

This script recursively scans chapters/ for .clm and .yaml files, executes them using the PTR runtime, and reports a comprehensive pass/fail summary.

🔍 Content Type Detection and Validation

  • Binary-first strategy: BinaryFirstStrategy runs signature detection directly on raw bytes via BinarySignatureDetector.detect_from_bytes() to avoid corruption.
  • Text detection: Falls back to text detectors only when no binary signature is recognized.
  • Validation registry: ValidationRegistry dispatches to BinaryValidator or TextValidator depending on the detected MIME.
  • YAML detection: TextFormatDetector._is_yaml() was refined to avoid misclassifying Python-like content as YAML.
  • Problematic bytes guard: Optional env flag MCARD_INTERPRETER_GUARD_PROBLEMATIC=1 treats certain pathological byte patterns as binary to prevent hangs.

🤝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 Documentation

For more detailed documentation, please see the docs directory:

📝 Changelog

See CHANGELOG.md for a list of notable changes.

📧 Contact

📚 Documentation

For more detailed documentation, please see the docs/ directory:

For version history, see CHANGELOG.md.

Core Concepts

MCard implements an algebraically closed system where:

  1. Every MCard is uniquely identified by its content hash (consistently using SHA-256 by default, with other algorithms configurable).
  2. Every MCard has an associated claim time (timezone-aware timestamp with microsecond precision).
  3. The database maintains these invariants automatically.
  4. Content integrity is guaranteed through immutable hashes.
  5. Temporal ordering is preserved at microsecond precision.

This design provides several key guarantees:

  • Content Integrity: The content hash serves as both identifier and verification mechanism.
  • Temporal Signature: All cards are associated with a timestamp: g_time.
  • Precedence Verification: The claim time enables determination of content presentation order.
  • Algebraic Closure: Any operation on MCards produces results that maintain these properties.
  • Type Safety: Built on Pydantic with strict validation and type checking.

Required Attributes for Each MCard

Each MCard must have the following three required attributes:

1. content: The actual data being stored (string or bytes).

2. hash: A cryptographic hash of the content, using SHA-256 by default (configurable to other algorithms).

3. g_time: A timezone-aware timestamp with microsecond precision, representing the global time when the card was claimed.

VCard as Curried MCard/PCard

We model VCard as a curried specialization of MCard/PCard using functional composition:

  • MCard = functorial fixed point: $F(content) = content$ (immutable data substrate)
  • PCard = polynomial interface protocol for MCard: $F(X) = \sum_i (A_i \times X^{B_i})$ (computational interpretation)
  • VCard = MCard/PCard + (verification rules, validation rules, security checks) (boundary enforcement)

Benefits:

  • PCard provides the polynomial functional interface to MCard's fixed point substrate
  • VCard completes the currying by adding security/verification boundaries
  • Separation of concerns: data substrate (MCard) → computational interface (PCard) → security boundaries (VCard)
  • Composability: different security rule sets can be applied to the same MCard/PCard foundation
  • Testability: unit-test each layer independently

See MVP Cards for PKC for the complete mathematical foundation of this currying architecture.

Directory Structure

  • mcard/: Contains the main application code.
    • algorithms/: Hash algorithm implementations (renamed from hash_algorithms)
    • engine/: Database engines (SQLite, DuckDB)
    • model/: Core data models
    • api.py: FastAPI endpoints
    • logging_config.py: Logging configuration
  • examples/: Example scripts demonstrating how to use the MCard system.
  • tests/: Contains test files for the application.
    • persistence/: Database persistence tests
    • unit/: Unit tests
  • logs/: Contains log files generated by the application.
  • data/db/: Directory for storing database files used by the application.
  • data/files/: Directory reserved for storing general files used by the application.
  • data/test_content/: Test files of various types for content detection and validation.
  • data/loaded_content/: Output directory for loaded and processed content (now gitignored).
  • docs/: Project documentation.

Database Storage and Indexing

MCard uses SQLite with BLOB storage for content. A virtual FTS5 table documents is maintained via triggers for text search. On first initialization, the engine creates the card table (BLOB content), the FTS table, and the triggers that keep them in sync.

Examples

Default MCard API Example: examples/MCard_Demo.py

This script demonstrates the simplest way to use the MCard API through the default_utility interface. It covers:

  • Adding new cards (with plain text or dictionaries, which are auto-converted to JSON)
  • Retrieving cards by hash
  • Searching for cards by content
  • Counting the total number of cards in the collection

How to Run the Demo

python examples/MCard_Demo.py

Key Features

  • Minimal Setup: Uses from mcard import default_utility for immediate access to core functionality.
  • Add and Retrieve: Shows how to add cards and retrieve them by hash.
  • Search: Demonstrates searching for cards containing a specific substring.
  • Summary Output: Prints the total number of cards and search results.

Modular Content Loader Example: examples/Content_Loader.py

This script demonstrates how to use the MCard system's content detection and storage features in a modular, easy-to-understand way. It:

  • Loads files from data/test_content/ (supports both text and binary types)
  • Uses the ContentTypeInterpreter to detect file types and validate content
  • Creates MCards for each file, handling text and binary content appropriately
  • Saves processed files to data/loaded_content/ with unique, type-appropriate filenames
  • Prints summaries of processed files and cleans up temporary files

How to Run the Example

python examples/Content_Loader.py

Key Features of the Example

  • Modular Functions: The script is organized into clear, single-purpose functions (e.g., load_test_files, create_mcard_for_file, save_card_to_file, etc.) for maintainability and extensibility.
  • Automatic Content Type Detection: Uses file signatures and content validation to determine file type and extension.
  • Binary and Text Handling: Handles binary files (e.g., images) and text files differently, ensuring correct storage and retrieval.
  • Output Directory: All processed content is saved to data/loaded_content/ (which is now gitignored).
  • Temporary File Cleanup: Removes temporary binary files after processing.

See the script and its docstrings for further details and customization options.

Handling Problematic Files (very large/single-line)

Some files can be pathological (e.g., extremely large single-line text or unstructured binaries). The loader now safely handles these via streamed text normalization with adaptive soft wrapping and strict byte/time caps.

  • Defaults remain safe: problematic files are skipped unless include_problematic=True.
  • When included, problematic files are processed as normalized text with UTF-8 replacement and soft wraps on-the-fly.
  • If streaming fails unexpectedly, the system falls back to a capped binary BLOB read.
  • Metadata captured for normalized files includes original_size and original_sha256_prefix.

Example using load_file_to_collection() from mcard.file_utility:

from pathlib import Path
from mcard.model.card_collection import CardCollection
from mcard.file_utility import load_file_to_collection

collection = CardCollection()

# Load a single file with safe streamed normalization (and optional metadata-only mode)
results = load_file_to_collection(
    Path("tests/test_data/OneMoreLongStringFile.js"),
    collection,
    include_problematic=True,             # opt-in to include problematic files
    max_bytes_on_problem=2 * 1024 * 1024, # cap for streaming/fallback paths
    metadata_only=False                   # set True to store only metadata for problematic files
)

# Or load a directory recursively with the same options
results = load_file_to_collection(
    Path("tests/test_data"),
    collection,
    recursive=True,
    include_problematic=True,
    max_bytes_on_problem=2 * 1024 * 1024,
    metadata_only=False
)

Notes:

  • Normalized text is stored with mime_type='text/plain' and includes normalized=True and wrap_width in the file info.
  • When fallback occurs, MIME is application/octet-stream and only capped bytes are stored.
  • Adaptive wrap width is chosen by extension via env-configured values.

Environment variables to tune behavior:

  • MCARD_WRAP_WIDTH_DEFAULT (default 1000)
  • MCARD_WRAP_WIDTH_KNOWN (default 1200)
  • MCARD_MAX_PROBLEM_TEXT_BYTES (default 2MB)
  • MCARD_READ_TIMEOUT_SECS (default 30)

.gitignore Notes

  • The data/loaded_content/ directory is now included in .gitignore and will not be tracked by git. This ensures that output/generated files do not pollute the repository.

PyTest Configuration

  • The project uses PyTest for testing.
  • Tests are located in the tests directory.
  • The configuration file pytest.ini specifies test paths and naming conventions.

🏢 Enterprise Logging System

MCard features a comprehensive, enterprise-grade logging system with structured multi-file logging, performance monitoring, and security auditing capabilities.

Key Features

  • Structured Logging: JSON-formatted logs with consistent schema across all components
  • Multi-File Strategy: Separate log files for different concerns (application, security, performance)
  • Performance Monitoring: Built-in PerformanceTimer for operation timing and bottleneck identification
  • Security Auditing: Dedicated SecurityAuditLogger for compliance and security event tracking
  • Colorized Console Output: Enhanced developer experience with color-coded log levels
  • Backward Compatibility: Dual logging system maintains compatibility with existing code

Logging Architecture

logs/
├── mcard.log              # Main application logs (rotating, 10MB, 5 backups)
├── mcard_security.log     # Security audit trail
├── mcard_performance.log  # Performance metrics and timing
└── mcard_structured.log   # Structured JSON logs for analysis

Usage Examples

Basic Application Logging

from mcard.config.improved_logging import setup_improved_logging, get_logger

def main():
    setup_improved_logging()  # Initialize enterprise logging
    logger = get_logger(__name__)
    logger.info("Application started", extra={"component": "main", "version": "0.1.23"})

Performance Monitoring

from mcard.config.improved_logging import PerformanceTimer

with PerformanceTimer("database_operation") as timer:
    # Your database operation here
    result = collection.add(card)
    timer.add_metadata({"records_processed": 1, "operation": "add"})

Security Auditing

from mcard.config.improved_logging import SecurityAuditLogger

audit_logger = SecurityAuditLogger()
audit_logger.log_access_attempt("user123", "read", "card_collection", success=True)
audit_logger.log_data_modification("user123", "create", "card", {"hash": "abc123"})

Configuration

Environment variables for logging control:

  • MCARD_SERVICE_LOG_LEVEL: Controls log level (DEBUG, INFO, WARNING, ERROR)
  • MCARD_LOG_FORMAT: Choose between 'json' or 'standard' formatting
  • MCARD_ENABLE_PERFORMANCE_LOGGING: Enable/disable performance monitoring
  • MCARD_ENABLE_SECURITY_LOGGING: Enable/disable security audit logging

Migration from Legacy Logging

The system maintains full backward compatibility. Existing code using the old logging system continues to work unchanged, while new code can leverage the enhanced features. A migration script is provided:

python migrate_logging.py

🚀 CI/CD Pipeline

MCard implements a comprehensive CI/CD pipeline with multi-platform testing, security checks, and automated deployment.

Pipeline Features

  • Multi-Platform Testing: Ubuntu, macOS, and Windows support
  • Multi-Python Version: Python 3.9, 3.10, 3.11, and 3.12 compatibility
  • Comprehensive Testing: Unit tests, integration tests, and coverage reporting
  • Security Scanning: Automated security checks with bandit and safety
  • Code Quality Gates: Linting, formatting, and type checking
  • Automated Deployment: PyPI publishing on release

Workflow Structure

# .github/workflows/ci.yml
jobs:
  test:        # Multi-platform, multi-Python testing
  security:    # Security vulnerability scanning  
  build:       # Package building and PyPI deployment

Quality Metrics

  • Test Coverage: 99.4% success rate across all test suites
  • Security Score: Zero critical vulnerabilities detected
  • Code Quality: 100% compliance with ruff linting rules
  • Performance: All tests complete in under 5 minutes

Running CI Locally

# Run the full test suite locally
make test-all

# Run security checks
make security-check

# Run linting and formatting
make lint
make format

🔧 Code Quality & Linting

MCard maintains enterprise-grade code quality through modern tooling and automated checks.

Tooling Stack

  • Ruff: Lightning-fast Python linter and formatter (replaces black, isort, flake8)
  • MyPy: Static type checking for type safety
  • Pre-commit: Automated code quality checks on commit
  • Pytest: Comprehensive testing framework with coverage reporting

Recent Quality Improvements

Lint Error Cleanup (Commit: b6a70a9)

  • 1000+ lint errors resolved across 76 files
  • Exception handling improved with proper 'from e' clauses
  • Modern typing - replaced deprecated typing.List with list[T]
  • Import organization - consistent import sorting and grouping
  • Code readability - improved formatting and structure
  • Zero regressions - all 17 tests continue to pass

Development Tooling Enhancements (Commit: 4ee6a7d)

  • Enhanced Makefile with 15+ development commands
  • Pre-commit hooks for automated quality checks
  • GitHub Actions with multi-OS/Python version support
  • Security scanning integrated into CI pipeline
  • Project health score improved from 7.5/10 to 8.5/10

Code Quality Commands

# Lint checking
uv run ruff check mcard/

# Auto-formatting  
uv run ruff format mcard/

# Type checking
uv run mypy mcard/ --ignore-missing-imports

# Run all quality checks
make quality-check

Quality Standards

  • Line Length: 88 characters (Black-compatible)
  • Import Sorting: Automatic with ruff
  • Type Hints: Required for all public APIs
  • Test Coverage: Minimum 75% coverage required
  • Documentation: Comprehensive docstrings for all modules

Advanced Topics

Hegel's Dialectic in Testing and CI/CD

Hegel's dialectic is a philosophical framework that describes the process of development and change through a triadic structure: thesis, antithesis, and synthesis. Here's how it relates to software testing and Continuous Integration/Continuous Deployment (CI/CD):

  1. Thesis (Initial Code): Represents the initial code or feature implementation, the starting point where a developer writes code to fulfill a specific requirement or feature.

  2. Antithesis (Testing and Bugs): Arises during the testing phase, where tests are executed. If tests fail or bugs are discovered, they represent a challenge to the initial implementation, highlighting discrepancies between intended functionality and actual behavior.

  3. Synthesis (Refinement and Improvement): Occurs when developers address the issues identified during testing, leading to a refined version of the code that resolves conflicts between the initial implementation and testing outcomes.

CI/CD Integration

In a CI/CD pipeline, this dialectical process is continuous:

  • Continuous Integration: Developers frequently integrate code changes into a shared repository. Each integration triggers automated tests, allowing for rapid identification of issues against the current codebase.

  • Continuous Deployment: Once the code passes testing, it can be automatically deployed, representing the synthesis where refined code is made available to users.

This iterative process fosters continuous improvement, where each round of testing and deployment leads to better software quality and functionality. By applying Hegel's dialectic, teams can embrace the idea that conflict (in the form of bugs and failures) is a natural and necessary part of the development process, ultimately leading to a more robust and effective product.

Handling Duplicate Events

When a duplicate card is detected, the duplicate_event_card is assigned a new timestamp value. This ensures that even though the content is identical to the original card, the hash value will be unique due to the different timestamp. This mechanism allows for robust handling of duplicate content while maintaining the integrity of the system.

MD5 Collision Testing

The test suite includes verification of MD5 collision detection using known collision pairs from the FastColl attack. These pairs produce identical MD5 hashes despite having different content:

MD5 Collision Pair

Input 1:
4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa200a8284bf36e8e4b55b35f427593d849676da0d1555d8360fb5f07fea2
                                                                     ^^^                                    ^^^

Input 2:
4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa202a8284bf36e8e4b55b35f427593d849676da0d1d55d8360fb5f07fea2
                                                                     ^^^                                    ^^^

Key differences:

  1. 200 vs 202
  2. d15 vs d1d

Both inputs produce the same MD5 hash value, demonstrating MD5's vulnerability to collision attacks. This is why MCard defaults to using more secure hash functions like SHA-256.

Modular PCards with Function Entry Points

MCard supports modular programming by allowing PCards to reference code stored in other MCards via code_hash. This enables code reuse and separation of concerns.

Additionally, PCards can specify a function entry_point within the referenced code. The runtime automatically handles input type conversion based on the PCard's input definition.

Example PCard YAML:

concrete:
  runtime: "python"
  operation: "custom"
  entry_point: "custom_cos"  # Function to call
  implementation:
    inputs:
      angle: "float"         # Input type for automatic conversion
  code_hash: "..."           # Hash of the MCard containing the Python code

Testing Behavior

The current tests, particularly @test_sqlite_persistence.py, will always clear the database after one of the test functions is run. This means that test_mcard.db will only contain the data from the last test executed. If the clear() function in the fixture is uncommented, it will remove the content of the last test as well.

Core Dependencies

  • SQLAlchemy>=1.4.47: SQL toolkit and ORM
  • aiosqlite>=0.17.0: SQLite async driver (project code uses synchronous APIs but retains this dependency for compatibility)
  • python-dateutil>=2.9.0.post0: Date/time utilities
  • python-dotenv>=1.1.0: Environment management

Description

MCard is a project designed to facilitate card management with a focus on validation and logging features.

Installation

Using uv

You can install the MCard package from PyPI (once published):

uv pip install mcard

Installing from source

To install MCard directly from the source code:

# Clone the repository
git clone https://github.com/yourusername/MCard_TDD.git
cd MCard_TDD

# Install in development mode with uv
uv pip install -e .

# Install with development dependencies
uv pip install -e ".[dev]"

Development Environment Setup

MCard uses modern Python tooling with uv for fast dependency management and virtual environment handling.

Quick Setup (Recommended)

# Clone and setup in one command
git clone https://github.com/xlp0/MCard_TDD.git
cd MCard_TDD
./activate_venv.sh

The activate_venv.sh script automatically:

  • ✅ Disables conda (if present) to avoid conflicts
  • ✅ Creates a virtual environment using uv venv .venv
  • ✅ Activates the virtual environment
  • ✅ Installs all dependencies with uv sync --all-extras --dev
  • ✅ Ensures you're using the project's preferred Python environment

Manual Setup

# Create and activate virtual environment with uv
uv venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies with development extras
uv sync --all-extras --dev

# Verify installation
uv run pytest tests/ -v

Development Commands

# Run tests with coverage
uv run pytest --cov=mcard --cov-report=term-missing

# Lint and format code
uv run ruff check mcard/
uv run ruff format mcard/

# Type checking
uv run mypy mcard/ --ignore-missing-imports

# Run all quality checks
make quality-check

Usage

After installation, you can use MCard in your Python code (synchronous API as shown above), or via the CLI.

CLI usage

The package installs a console script mcard with the following subcommands:

# Initialize the database (creates schema if needed)
mcard init --db data/cli_demo.db

# Add a card from text or a file
mcard add --text "hello world" --db data/cli_demo.db
mcard add --file README.md --db data/cli_demo.db

# Retrieve a card by hash
mcard get --hash <hash> --db data/cli_demo.db

# Search by a text fragment
mcard search --query hello --db data/cli_demo.db

# Count cards
mcard count --db data/cli_demo.db

🎯 Recent Major Improvements

Enterprise Logging System Implementation (Commits: e0b18ce, 8d657bf, f7c7446)

  • Comprehensive logging overhaul with structured multi-file logging strategy
  • Performance monitoring with PerformanceTimer for operation timing
  • Security auditing with dedicated SecurityAuditLogger for compliance
  • Backward compatibility maintained - zero breaking changes to existing code
  • 11/11 tests pass for logging system with comprehensive test coverage
  • Automated migration script for upgrading existing codebases
  • Production-ready with colorlog dependency and structured JSON output

Code Quality & Lint Cleanup (Commit: b6a70a9)

  • 1000+ lint errors resolved across 76 files with zero regressions
  • Exception handling modernized with proper 'from e' clauses
  • Type system upgraded - replaced deprecated typing.List with modern list[T]
  • Import organization - consistent sorting and grouping across codebase
  • Code readability improved with better formatting and structure
  • All 17 tests continue to pass - no functionality broken

Development Tooling & CI/CD Pipeline (Commit: 4ee6a7d)

  • Enhanced development environment with modern tooling (ruff, pre-commit)
  • Comprehensive CI/CD pipeline with GitHub Actions supporting multi-OS/Python versions
  • Automated security checks integrated into pipeline
  • Improved Makefile with 15+ development commands for common tasks
  • Project health score improved from 7.5/10 to 8.5/10
  • Production-ready development practices implemented

Repository Optimization & Cleanup

  • Enhanced .gitignore with 272 comprehensive patterns (Commit: 11cddd7)
  • Repository size optimization - removed 187MB of unnecessary files
  • Git history cleanup using git filter-repo for cleaner repository
  • Cross-platform support for macOS, Windows, and Linux development
  • AI IDE framework compatibility with BMAD-METHOD integration

BMAD-METHOD Framework Integration (Commit: 81f1647)

  • 11 specialized agent roles for comprehensive development workflow
  • Zero Trust Architecture with enterprise-grade security patterns
  • Polynomial functor mathematics for advanced MCard operations
  • Cross-platform agent definitions for multiple AI IDEs
  • 207 files committed with major architectural enhancements

Configuration Management Refactoring

  • Renamed EnvConfig to EnvParameters for better clarity and consistency
  • Moved configuration management from env_config.py to env_parameters.py
  • Updated all references to use the new class name across the codebase
  • Enhanced test coverage for configuration parameters
  • Maintained singleton pattern for configuration management
  • Ensured backward compatibility with existing environment variable handling

Database & Performance Enhancements

  • Implemented get_all() method in SQLiteEngine for efficient pagination
  • Added support for page size and page number parameters
  • Enhanced error handling for invalid pagination parameters
  • Improved performance by optimizing SQL queries
  • Added comprehensive test coverage for pagination functionality

Recent Changes

Directory Structure Updates

  • The hash_algorithms directory has been renamed to algorithms for simplicity and clarity.
  • The hash_validator.py file has been renamed to validator.py to simplify the naming convention.

Updated Imports

  • All relevant import statements across the codebase have been updated to reflect the new structure and naming.

Engine Refactor

  • Removed the abstract search_by_content method from SQLiteEngine and DuckDBEngine.
  • Integrated search functionality into the search_by_string method, allowing searches across content, hash, and g_time fields.

Event Generation

Logging

  • Integrated logging into test cases for better traceability and debugging.

MCard Class Update

  • The MCard constructor now accepts a hash_function parameter, providing more flexibility in hash generation.

Tests

  • Adjusted tests to verify the new event generation logic and ensure search functionality works as intended.

Centralized Configuration Management

Overview

MCard has adopted a centralized configuration management approach to improve maintainability, scalability, and readability. This involves consolidating all configuration constants into a single location, making it easier to manage and update configuration values across the application.

Configuration Constants

All configuration constants are now defined in config_constants.py. This file contains named constants for various configuration values, including:

  • Database schema and paths
  • Hash algorithm constants and hierarchy
  • Environment variable names
  • API configuration
  • HTTP status codes
  • Error messages
  • Event types and structure

Benefits

Centralized configuration management provides several benefits, including:

  • Single Source of Truth: All configuration constants are managed in one location.
  • Type Safety: Constants are properly typed and documented.
  • Maintainability: Changes to configuration values only need to be made in one place.
  • Code Completion: IDE support for constant names improves developer productivity.
  • Documentation: Each constant group is documented with its purpose and usage.
  • Testing: Test files use the same constants as production code, ensuring consistency.

Implementation

The config_constants.py file uses an enum-based approach for hash algorithms, ensuring type safety and readability. The file is organized into logical groups, making it easier to find and update specific configuration values.

Example Usage

To use a configuration constant, simply import the config_constants module and access the desired constant. For example:

from config_constants import HASH_ALGORITHM_SHA256

# Use the SHA-256 hash algorithm
hash_algorithm = HASH_ALGORITHM_SHA256

By adopting a centralized configuration management approach, MCard has improved its maintainability, scalability, and readability, making it easier to manage and update configuration values across the application.

Using MCardFromData for Stored Values

When retrieving stored MCard data from the database, always use the subclass MCardFromData. This approach allows you to bypass unnecessary and unwanted algorithms, significantly speeding up the MCard instantiation process.

Project Structure

MCard_TDD/
├── mcard/
│   ├── algorithms/          # Hash algorithm implementations
│   ├── engine/             # Database engines (SQLite, DuckDB)
│   ├── model/              # Core data models
│   ├── api.py             # FastAPI endpoints
│   └── logging_config.py   # Logging configuration
├── tests/
│   ├── persistence/       # Database persistence tests
│   └── unit/             # Unit tests
├── docs/                  # Project documentation
├── data/
│   ├── db/               # Database files
│   └── files/            # General files
└── logs/                 # Application logs

Configuration

Environment Setup

Create a .env file with the following variables:

MCARD_DB_PATH=data/db/mcard_demo.db
TEST_DB_PATH=data/db/test_mcard.db
MCARD_SERVICE_LOG_LEVEL=DEBUG

Development Guidelines

Using MCardFromData

When retrieving stored data, use MCardFromData instead of the base MCard class:

from mcard.model.card import MCardFromData

stored_card = MCardFromData(content=content, hash=hash, g_time=g_time)

Hash Algorithm Configuration

The default hash algorithm is SHA-256, but it's configurable:

from mcard.algorithms import HASH_ALGORITHM_SHA256

Installation

To set up the project, follow these steps:

  1. Create a virtual environment:

    python -m venv .venv
    
  2. Activate the virtual environment:

    • On macOS and Linux:
      source .venv/bin/activate
      
    • On Windows:
      .venv\Scripts\activate
      
  3. Configure your environment:

    • Copy .env.example to create your own .env file.
    • The default configuration uses:
      • Database path: data/db/mcard_demo.db.
      • Hash algorithm: SHA-256.
      • Connection pool size: 5.
      • Connection timeout: 30 seconds.

Directory Structure

  • mcard/
    • engine/: Contains the database engine implementations, currently only SQLite.
    • model/: Contains the core data models, including MCard.
    • tests/: Contains all test cases for the MCard library, ensuring functionality and correctness.

SQLite Persistence Testing

  • tests/persistence/sqlite_test.py: Contains test cases for SQLite persistence, ensuring data integrity and consistency.

The tests in @test_sqlite_persistence.py are designed to clear the database after each test function is run. This means that the test_mcard.db file will only contain the data from the last test executed. If the clear() function in the fixture is uncommented, it will remove the content of the last test as well. This behavior is intended to ensure that each test starts with a clean database, allowing for more accurate and reliable testing results.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcard-0.1.23.tar.gz (214.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcard-0.1.23-py3-none-any.whl (204.7 kB view details)

Uploaded Python 3

File details

Details for the file mcard-0.1.23.tar.gz.

File metadata

  • Download URL: mcard-0.1.23.tar.gz
  • Upload date:
  • Size: 214.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for mcard-0.1.23.tar.gz
Algorithm Hash digest
SHA256 be8d2035062efdd6c6ea9c6fd90e83a6bb7e17fccc430e0dc715e7e4cbf295b1
MD5 ed1bbd7bb2ea93fcbd7e9fa8a14105f5
BLAKE2b-256 f52ca373ebbe66039bcfd30ae10ce04a00fa1ed47f8272f3846b0101c439bd1c

See more details on using hashes here.

File details

Details for the file mcard-0.1.23-py3-none-any.whl.

File metadata

  • Download URL: mcard-0.1.23-py3-none-any.whl
  • Upload date:
  • Size: 204.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for mcard-0.1.23-py3-none-any.whl
Algorithm Hash digest
SHA256 6809b0c2fb511fb48272d5ea00111af85f0b15e7b0aa65e93fcf67281bf759e7
MD5 81357d12395a1df4798bdc32c22fd10c
BLAKE2b-256 eff36877efc84a8213f83e28f405b3e71d2b015ff6362549d881dee2c94b547c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page