Skip to main content

MCard: Memory Card with TDD approach

Project description

MCard Core

A Python library implementing an algebraically closed data structure for content-addressable storage. MCard ensures that every piece of content in the system is uniquely identified by its hash and temporally ordered by its claim time, enabling robust content verification and precedence ordering. It allows for an Monadic approach to namespace management and content deduplication for any type of data.

Documentation

Core Concepts

MCard implements an algebraically closed system where:

  1. Every MCard is uniquely identified by its content hash (configurable, defaulting to SHA-256).
  2. Every MCard has an associated claim time (timezone-aware timestamp with microsecond precision).
  3. The database maintains these invariants automatically.
  4. Content integrity is guaranteed through immutable hashes.
  5. Temporal ordering is preserved at microsecond precision.

This design provides several key guarantees:

  • Content Integrity: The content hash serves as both identifier and verification mechanism.
  • Temporal Signature: All cards are associated with a timestamp: g_time.
  • Precedence Verification: The claim time enables determination of content presentation order.
  • Algebraic Closure: Any operation on MCards produces results that maintain these properties.
  • Type Safety: Built on Pydantic with strict validation and type checking.

Required Attributes for Each MCard

Each MCard must have the following three required attributes:

1. content: The actual data being stored (string or bytes).

2. hash: A cryptographic hash of the content, using SHA-256 by default (configurable to other algorithms).

3. g_time: A timezone-aware timestamp with microsecond precision, representing the global time when the card was claimed.

Directory Structure

  • mcard/: Contains the main application code.
  • examples/: Example scripts demonstrating how to use the MCard system (see below for details on Content_Loader.py).
  • tests/: Contains test files for the application.
  • logs/: Contains log files generated by the application.
  • data/db/: Directory for storing database files used by the application.
  • data/files/: Directory reserved for storing general files used by the application.
  • data/test_content/: Test files of various types for content detection and validation.
  • data/loaded_content/: Output directory for loaded and processed content (now gitignored).

Database Technologies

We will be using embedded database technologies, such as SQLite, DuckDB, and LanceDB initially, to provide efficient and reliable data storage solutions for MCard. These technologies are well-suited for handling the requirements of content-addressable storage and will allow for easy integration and management of data within the application.

Examples

Default MCard API Example: examples/MCard_Demo.py

This script demonstrates the simplest way to use the MCard API through the default_utility interface. It covers:

  • Adding new cards (with plain text or dictionaries, which are auto-converted to JSON)
  • Retrieving cards by hash
  • Searching for cards by content
  • Counting the total number of cards in the collection

How to Run the Demo

python examples/MCard_Demo.py

Key Features

  • Minimal Setup: Uses from mcard import default_utility for immediate access to core functionality.
  • Add and Retrieve: Shows how to add cards and retrieve them by hash.
  • Search: Demonstrates searching for cards containing a specific substring.
  • Summary Output: Prints the total number of cards and search results.

Modular Content Loader Example: examples/Content_Loader.py

This script demonstrates how to use the MCard system's content detection and storage features in a modular, easy-to-understand way. It:

  • Loads files from data/test_content/ (supports both text and binary types)
  • Uses the ContentTypeInterpreter to detect file types and validate content
  • Creates MCards for each file, handling text and binary content appropriately
  • Saves processed files to data/loaded_content/ with unique, type-appropriate filenames
  • Prints summaries of processed files and cleans up temporary files

How to Run the Example

python examples/Content_Loader.py

Key Features of the Example

  • Modular Functions: The script is organized into clear, single-purpose functions (e.g., load_test_files, create_mcard_for_file, save_card_to_file, etc.) for maintainability and extensibility.
  • Automatic Content Type Detection: Uses file signatures and content validation to determine file type and extension.
  • Binary and Text Handling: Handles binary files (e.g., images) and text files differently, ensuring correct storage and retrieval.
  • Output Directory: All processed content is saved to data/loaded_content/ (which is now gitignored).
  • Temporary File Cleanup: Removes temporary binary files after processing.

See the script and its docstrings for further details and customization options.

API Endpoint

MCard can serve as an API endpoint for serving data content. By using FastAPI with Uvicorn, you can easily create and manage API routes for accessing and manipulating MCard data. FastAPI provides automatic interactive API documentation and is designed for high performance, making it an excellent choice for this project.

FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. Uvicorn is an ASGI server that allows you to serve the FastAPI application. To run the API server, use the following command:

uvicorn mcard.api:app --reload

This command will start the Uvicorn server with the FastAPI application defined in mcard/api.py, allowing you to access the MCard API at http://localhost:8000. You can also view the interactive API documentation at http://localhost:8000/docs.

.gitignore Notes

  • The data/loaded_content/ directory is now included in .gitignore and will not be tracked by git. This ensures that output/generated files do not pollute the repository.

PyTest Configuration

  • The project uses PyTest for testing.
  • Tests are located in the tests directory.
  • The configuration file pytest.ini specifies test paths and naming conventions.

Logging Configuration

  • The project uses Python's built-in logging module for logging.
  • Logs are written to logs/mcard.log with a maximum size of 10MB and up to 5 backup files.
  • The logging format includes timestamps, log levels, and detailed information about the source of the log messages.
  • The logging level is set to DEBUG for console output and INFO for file output.
  • To initialize logging, call setup_logging() from mcard.logging_config before running tests or application code.

Running Tests

To run tests:

pytest

To run tests with coverage:

pytest --cov=mcard

Hegel's Dialectic in Testing and CI/CD

Hegel's dialectic is a philosophical framework that describes the process of development and change through a triadic structure: thesis, antithesis, and synthesis. Here's how it relates to software testing and Continuous Integration/Continuous Deployment (CI/CD):

  1. Thesis (Initial Code): Represents the initial code or feature implementation, the starting point where a developer writes code to fulfill a specific requirement or feature.

  2. Antithesis (Testing and Bugs): Arises during the testing phase, where tests are executed. If tests fail or bugs are discovered, they represent a challenge to the initial implementation, highlighting discrepancies between intended functionality and actual behavior.

  3. Synthesis (Refinement and Improvement): Occurs when developers address the issues identified during testing, leading to a refined version of the code that resolves conflicts between the initial implementation and testing outcomes.

CI/CD Integration

In a CI/CD pipeline, this dialectical process is continuous:

  • Continuous Integration: Developers frequently integrate code changes into a shared repository. Each integration triggers automated tests, allowing for rapid identification of issues against the current codebase.

  • Continuous Deployment: Once the code passes testing, it can be automatically deployed, representing the synthesis where refined code is made available to users.

This iterative process fosters continuous improvement, where each round of testing and deployment leads to better software quality and functionality. By applying Hegel's dialectic, teams can embrace the idea that conflict (in the form of bugs and failures) is a natural and necessary part of the development process, ultimately leading to a more robust and effective product.

Handling Duplicate Events

When a duplicate card is detected, the duplicate_event_card is assigned a new timestamp value. This ensures that even though the content is identical to the original card, the hash value will be unique due to the different timestamp. This mechanism allows for robust handling of duplicate content while maintaining the integrity of the system.

MD5 Collision Testing

The test suite includes verification of MD5 collision detection using known collision pairs from the FastColl attack. These pairs produce identical MD5 hashes despite having different content:

MD5 Collision Pair

Input 1:
4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa200a8284bf36e8e4b55b35f427593d849676da0d1555d8360fb5f07fea2
                                                                     ^^^                                    ^^^

Input 2:
4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa202a8284bf36e8e4b55b35f427593d849676da0d1d55d8360fb5f07fea2
                                                                     ^^^                                    ^^^

Key differences:

  1. 200 vs 202
  2. d15 vs d1d

Both inputs produce the same MD5 hash value, demonstrating MD5's vulnerability to collision attacks. This is why MCard defaults to using more secure hash functions like SHA-256.

Testing Behavior

The current tests, particularly @test_sqlite_persistence.py, will always clear the database after one of the test functions is run. This means that test_mcard.db will only contain the data from the last test executed. If the clear() function in the fixture is uncommented, it will remove the content of the last test as well.

Core Dependencies

  • SQLAlchemy==1.4.47: SQL toolkit and ORM
  • aiosqlite==0.17.0: Async SQLite database driver
  • python-dateutil==2.8.2: Date/time utilities
  • python-dotenv==1.0.0: Environment management

Description

MCard is a project designed to facilitate card management with a focus on validation and logging features.

Installation

Using uv

You can install the MCard package from PyPI (once published):

uv pip install mcard

Installing from source

To install MCard directly from the source code:

# Clone the repository
git clone https://github.com/yourusername/MCard_TDD.git
cd MCard_TDD

# Install in development mode with uv
uv pip install -e .

# Install with development dependencies
uv pip install -e ".[dev]"

Development Environment Setup

  1. Set up a virtual environment using uv:
# Simply run the activate script which handles uv setup
source activate_venv.sh

This script will:

  • Ensure conda is disabled (if present)
  • Create a virtual environment using uv if it doesn't exist
  • Activate the virtual environment
  • Install dependencies from pyproject.toml using uv

Alternatively, you can manually set up the environment:

# Create and activate virtual environment with uv
uv venv .venv
source .venv/bin/activate

# Install dependencies with uv
uv pip sync pyproject.toml

Usage

After installation, you can use MCard in your Python code:

from mcard.model.card import MCard
from mcard.model.card_collection import CardCollection

# Create a new card
card = MCard(content="Hello, MCard!")

# Create a card collection
collection = CardCollection()

# Add the card to the collection
collection.add(card)

# Retrieve the card by its hash
retrieved_card = collection.get_by_hash(card.hash)
print(retrieved_card.content)  # Outputs: Hello, MCard!

Or use the installed command-line entry point

mcard


## Recent Updates

### MCard Detail View Component
- Created a new component `mcard_detail_view.html` to display detailed information about MCards, including:
  - Full hash string
  - g_time string
  - Content type
  - Appropriate content display for images, videos, PDFs, and plain text.

### Dynamic Content Loading
- Implemented functionality to dynamically load and display card details when a card entry is clicked.
- Added JavaScript functions to handle click events and fetch card details from the server.

### Error Handling and Logging
- Enhanced error handling in the Flask backend to log errors and provide better feedback.
- Added detailed logging in the JavaScript to track the fetching and rendering process.

### Template Updates
- Updated existing templates to integrate the new detail view component and ensure proper rendering.

### User Experience Improvements
- Improved visual feedback for selected cards.
- Ensured that the focused area updates correctly without becoming blank.

### Configuration Management Refactoring (2024-12-18)
- Renamed `EnvConfig` to `EnvParameters` for better clarity and consistency
- Moved configuration management from `env_config.py` to `env_parameters.py`
- Updated all references to use the new class name across the codebase
- Enhanced test coverage for configuration parameters
- Maintained singleton pattern for configuration management
- Ensured backward compatibility with existing environment variable handling

### Database Enhancements
- Implemented `get_all()` method in SQLiteEngine for efficient pagination
- Added support for page size and page number parameters
- Enhanced error handling for invalid pagination parameters
- Improved performance by optimizing SQL queries
- Added comprehensive test coverage for pagination functionality

## Recent Changes

### Directory Structure Updates
- The `hash_algorithms` directory has been renamed to `algorithms` for simplicity and clarity.
- The `hash_validator.py` file has been renamed to `validator.py` to simplify the naming convention.

### Updated Imports
- All relevant import statements across the codebase have been updated to reflect the new structure and naming.

### Engine Refactor
- Removed the abstract `search_by_content` method from `SQLiteEngine` and `DuckDBEngine`.
- Integrated search functionality into the [search_by_string](cci:1://file:///mcard/model/card_collection.py:94:4-96:82) method, allowing searches across content, hash, and g_time fields.

### Event Generation
- Updated [generate_duplication_event](cci:1://file:///mcard/model/event_producer.py:38:0-54:28) and [generate_collision_event](cci:1://file:///mcard/model/event_producer.py:57:0-76:38) to return JSON strings.
- Enhanced event structure to include upgraded hash functions and content size.

### Logging
- Integrated logging into test cases for better traceability and debugging.

### MCard Class Update
- The [MCard](cci:2://file:///mcard/model/card.py:6:0-47:9) constructor now accepts a [hash_function](cci:1://file:///mcard/model/event_producer.py:8:0-23:16) parameter, providing more flexibility in hash generation.

### Tests
- Adjusted tests to verify the new event generation logic and ensure search functionality works as intended.

## Centralized Configuration Management

### Overview
MCard has adopted a centralized configuration management approach to improve maintainability, scalability, and readability. This involves consolidating all configuration constants into a single location, making it easier to manage and update configuration values across the application.

### Configuration Constants
All configuration constants are now defined in `config_constants.py`. This file contains named constants for various configuration values, including:

- Database schema and paths
- Hash algorithm constants and hierarchy
- Environment variable names
- API configuration
- HTTP status codes
- Error messages
- Event types and structure

### Benefits
Centralized configuration management provides several benefits, including:

- **Single Source of Truth**: All configuration constants are managed in one location.
- **Type Safety**: Constants are properly typed and documented.
- **Maintainability**: Changes to configuration values only need to be made in one place.
- **Code Completion**: IDE support for constant names improves developer productivity.
- **Documentation**: Each constant group is documented with its purpose and usage.
- **Testing**: Test files use the same constants as production code, ensuring consistency.

### Implementation
The `config_constants.py` file uses an enum-based approach for hash algorithms, ensuring type safety and readability. The file is organized into logical groups, making it easier to find and update specific configuration values.

### Example Usage
To use a configuration constant, simply import the `config_constants` module and access the desired constant. For example:
```python
from config_constants import HASH_ALGORITHM_SHA256

# Use the SHA-256 hash algorithm
hash_algorithm = HASH_ALGORITHM_SHA256

By adopting a centralized configuration management approach, MCard has improved its maintainability, scalability, and readability, making it easier to manage and update configuration values across the application.

Using MCardFromData for Stored Values

When retrieving stored MCard data from the database, always use the subclass MCardFromData. This approach allows you to bypass unnecessary and unwanted algorithms, significantly speeding up the MCard instantiation process.

Project Structure

MCard_TDD/
├── mcard/
│   ├── algorithms/          # Hash algorithm implementations
│   ├── engine/             # Database engines (SQLite, DuckDB)
│   ├── model/              # Core data models
│   ├── api.py             # FastAPI endpoints
│   └── logging_config.py   # Logging configuration
├── tests/
│   ├── persistence/       # Database persistence tests
│   └── unit/             # Unit tests
├── docs/                  # Project documentation
├── data/
│   ├── db/               # Database files
│   └── files/            # General files
└── logs/                 # Application logs

Configuration

Environment Setup

Create a .env file with the following variables:

MCARD_DB_PATH=data/db/mcard_demo.db
TEST_DB_PATH=data/db/test_mcard.db
MCARD_SERVICE_LOG_LEVEL=DEBUG

Development Guidelines

Using MCardFromData

When retrieving stored data, use MCardFromData instead of the base MCard class:

from mcard.model.card import MCardFromData

stored_card = MCardFromData(content=content, hash=hash, g_time=g_time)

Hash Algorithm Configuration

The default hash algorithm is SHA-256, but it's configurable:

from mcard.algorithms import HASH_ALGORITHM_SHA256

Installation

To set up the project, follow these steps:

  1. Create a virtual environment:

    python -m venv .venv
    
  2. Activate the virtual environment:

    • On macOS and Linux:
      source .venv/bin/activate
      
    • On Windows:
      .venv\Scripts\activate
      
  3. Configure your environment:

    • Copy .env.example to create your own .env file.
    • The default configuration uses:
      • Database path: data/db/mcard_demo.db.
      • Hash algorithm: SHA-256.
      • Connection pool size: 5.
      • Connection timeout: 30 seconds.

Directory Structure

  • mcard/
    • engine/: Contains the database engine implementations, including SQLite and DuckDB.
    • model/: Contains the core data models, including MCard.
    • tests/: Contains all test cases for the MCard library, ensuring functionality and correctness.

SQLite Persistence Testing

  • tests/persistence/sqlite_test.py: Contains test cases for SQLite persistence, ensuring data integrity and consistency.

The tests in @test_sqlite_persistence.py are designed to clear the database after each test function is run. This means that the test_mcard.db file will only contain the data from the last test executed. If the clear() function in the fixture is uncommented, it will remove the content of the last test as well. This behavior is intended to ensure that each test starts with a clean database, allowing for more accurate and reliable testing results.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcard-0.1.8.tar.gz (58.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcard-0.1.8-py3-none-any.whl (65.8 kB view details)

Uploaded Python 3

File details

Details for the file mcard-0.1.8.tar.gz.

File metadata

  • Download URL: mcard-0.1.8.tar.gz
  • Upload date:
  • Size: 58.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for mcard-0.1.8.tar.gz
Algorithm Hash digest
SHA256 84cc8db0832bd9a87926b32041921233423ec23a425181d2fdfd50a6f2538aad
MD5 1057d04891b38355baadadafd51a5a11
BLAKE2b-256 be7b1139518e4e1a000717fd0a82c5c0fd29d93b6535bc65cee2de1c48d90135

See more details on using hashes here.

File details

Details for the file mcard-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: mcard-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 65.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for mcard-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 1b97ce60fd1c1e9e128a08e1cab49b8c87f482a9309f754b7b6d8d99119ebb2a
MD5 a7a907789b508c33b1255d783090462d
BLAKE2b-256 6554eaffa332dea69ee1128d5d583bf368e9884b960a286cad5b9f219ed25660

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page