Skip to main content

MCard: Memory Card with TDD approach

Project description

MCard Core

A Python library implementing an algebraically closed data structure for content-addressable storage. MCard ensures that every piece of content in the system is uniquely identified by its hash and temporally ordered by its claim time, enabling robust content verification and precedence ordering. It allows for an Monadic approach to namespace management and content deduplication for any type of data.

Documentation

Core Concepts

MCard implements an algebraically closed system where:

  1. Every MCard is uniquely identified by its content hash (configurable, defaulting to SHA-256).
  2. Every MCard has an associated claim time (timezone-aware timestamp with microsecond precision).
  3. The database maintains these invariants automatically.
  4. Content integrity is guaranteed through immutable hashes.
  5. Temporal ordering is preserved at microsecond precision.

This design provides several key guarantees:

  • Content Integrity: The content hash serves as both identifier and verification mechanism.
  • Temporal Signature: All cards are associated with a timestamp: g_time.
  • Precedence Verification: The claim time enables determination of content presentation order.
  • Algebraic Closure: Any operation on MCards produces results that maintain these properties.
  • Type Safety: Built on Pydantic with strict validation and type checking.

Required Attributes for Each MCard

Each MCard must have the following three required attributes:

1. content: The actual data being stored (string or bytes).

2. hash: A cryptographic hash of the content, using SHA-256 by default (configurable to other algorithms).

3. g_time: A timezone-aware timestamp with microsecond precision, representing the global time when the card was claimed.

Directory Structure

  • mcard/: Contains the main application code.
  • tests/: Contains test files for the application.
  • logs/: Contains log files generated by the application.
  • data/db/: Directory for storing database files used by the application.
  • data/files/: Directory reserved for storing general files used by the application.

Database Technologies

We will be using embedded database technologies, such as SQLite, DuckDB, and LanceDB initially, to provide efficient and reliable data storage solutions for MCard. These technologies are well-suited for handling the requirements of content-addressable storage and will allow for easy integration and management of data within the application.

API Endpoint

MCard can serve as an API endpoint for serving data content. By using FastAPI with Uvicorn, you can easily create and manage API routes for accessing and manipulating MCard data. FastAPI provides automatic interactive API documentation and is designed for high performance, making it an excellent choice for this project.

FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. Uvicorn is an ASGI server that allows you to serve the FastAPI application. To run the API server, use the following command:

uvicorn mcard.api:app --reload

This command will start the Uvicorn server with the FastAPI application defined in mcard/api.py, allowing you to access the MCard API at http://localhost:8000. You can also view the interactive API documentation at http://localhost:8000/docs.

PyTest Configuration

  • The project uses PyTest for testing.
  • Tests are located in the tests directory.
  • The configuration file pytest.ini specifies test paths and naming conventions.

Logging Configuration

  • The project uses Python's built-in logging module for logging.
  • Logs are written to logs/mcard.log with a maximum size of 10MB and up to 5 backup files.
  • The logging format includes timestamps, log levels, and detailed information about the source of the log messages.
  • The logging level is set to DEBUG for console output and INFO for file output.
  • To initialize logging, call setup_logging() from mcard.logging_config before running tests or application code.

Running Tests

To run tests:

pytest

To run tests with coverage:

pytest --cov=mcard

Hegel's Dialectic in Testing and CI/CD

Hegel's dialectic is a philosophical framework that describes the process of development and change through a triadic structure: thesis, antithesis, and synthesis. Here's how it relates to software testing and Continuous Integration/Continuous Deployment (CI/CD):

  1. Thesis (Initial Code): Represents the initial code or feature implementation, the starting point where a developer writes code to fulfill a specific requirement or feature.

  2. Antithesis (Testing and Bugs): Arises during the testing phase, where tests are executed. If tests fail or bugs are discovered, they represent a challenge to the initial implementation, highlighting discrepancies between intended functionality and actual behavior.

  3. Synthesis (Refinement and Improvement): Occurs when developers address the issues identified during testing, leading to a refined version of the code that resolves conflicts between the initial implementation and testing outcomes.

CI/CD Integration

In a CI/CD pipeline, this dialectical process is continuous:

  • Continuous Integration: Developers frequently integrate code changes into a shared repository. Each integration triggers automated tests, allowing for rapid identification of issues against the current codebase.

  • Continuous Deployment: Once the code passes testing, it can be automatically deployed, representing the synthesis where refined code is made available to users.

This iterative process fosters continuous improvement, where each round of testing and deployment leads to better software quality and functionality. By applying Hegel's dialectic, teams can embrace the idea that conflict (in the form of bugs and failures) is a natural and necessary part of the development process, ultimately leading to a more robust and effective product.

Handling Duplicate Events

When a duplicate card is detected, the duplicate_event_card is assigned a new timestamp value. This ensures that even though the content is identical to the original card, the hash value will be unique due to the different timestamp. This mechanism allows for robust handling of duplicate content while maintaining the integrity of the system.

MD5 Collision Testing

The test suite includes verification of MD5 collision detection using known collision pairs from the FastColl attack. These pairs produce identical MD5 hashes despite having different content:

MD5 Collision Pair

Input 1:
4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa200a8284bf36e8e4b55b35f427593d849676da0d1555d8360fb5f07fea2
                                                                     ^^^                                    ^^^

Input 2:
4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa202a8284bf36e8e4b55b35f427593d849676da0d1d55d8360fb5f07fea2
                                                                     ^^^                                    ^^^

Key differences:

  1. 200 vs 202
  2. d15 vs d1d

Both inputs produce the same MD5 hash value, demonstrating MD5's vulnerability to collision attacks. This is why MCard defaults to using more secure hash functions like SHA-256.

Testing Behavior

The current tests, particularly @test_sqlite_persistence.py, will always clear the database after one of the test functions is run. This means that test_mcard.db will only contain the data from the last test executed. If the clear() function in the fixture is uncommented, it will remove the content of the last test as well.

Core Dependencies

  • SQLAlchemy==1.4.47: SQL toolkit and ORM
  • aiosqlite==0.17.0: Async SQLite database driver
  • python-dateutil==2.8.2: Date/time utilities
  • python-dotenv==1.0.0: Environment management

Description

MCard is a project designed to facilitate card management with a focus on validation and logging features.

Installation

Using uv

You can install the MCard package from PyPI (once published):

uv pip install mcard

Installing from source

To install MCard directly from the source code:

# Clone the repository
git clone https://github.com/yourusername/MCard_TDD.git
cd MCard_TDD

# Install in development mode with uv
uv pip install -e .

# Install with development dependencies
uv pip install -e ".[dev]"

Development Environment Setup

  1. Set up a virtual environment using uv:
# Simply run the activate script which handles uv setup
source activate_venv.sh

This script will:

  • Ensure conda is disabled (if present)
  • Create a virtual environment using uv if it doesn't exist
  • Activate the virtual environment
  • Install dependencies from pyproject.toml using uv

Alternatively, you can manually set up the environment:

# Create and activate virtual environment with uv
uv venv .venv
source .venv/bin/activate

# Install dependencies with uv
uv pip sync pyproject.toml

Usage

After installation, you can use MCard in your Python code:

from mcard.model.card import MCard
from mcard.model.card_collection import CardCollection

# Create a new card
card = MCard(content="Hello, MCard!")

# Create a card collection
collection = CardCollection()

# Add the card to the collection
collection.add(card)

# Retrieve the card by its hash
retrieved_card = collection.get_by_hash(card.hash)
print(retrieved_card.content)  # Outputs: Hello, MCard!

Or use the installed command-line entry point

mcard


## Recent Updates

### MCard Detail View Component
- Created a new component `mcard_detail_view.html` to display detailed information about MCards, including:
  - Full hash string
  - g_time string
  - Content type
  - Appropriate content display for images, videos, PDFs, and plain text.

### Dynamic Content Loading
- Implemented functionality to dynamically load and display card details when a card entry is clicked.
- Added JavaScript functions to handle click events and fetch card details from the server.

### Error Handling and Logging
- Enhanced error handling in the Flask backend to log errors and provide better feedback.
- Added detailed logging in the JavaScript to track the fetching and rendering process.

### Template Updates
- Updated existing templates to integrate the new detail view component and ensure proper rendering.

### User Experience Improvements
- Improved visual feedback for selected cards.
- Ensured that the focused area updates correctly without becoming blank.

### Configuration Management Refactoring (2024-12-18)
- Renamed `EnvConfig` to `EnvParameters` for better clarity and consistency
- Moved configuration management from `env_config.py` to `env_parameters.py`
- Updated all references to use the new class name across the codebase
- Enhanced test coverage for configuration parameters
- Maintained singleton pattern for configuration management
- Ensured backward compatibility with existing environment variable handling

### Database Enhancements
- Implemented `get_all()` method in SQLiteEngine for efficient pagination
- Added support for page size and page number parameters
- Enhanced error handling for invalid pagination parameters
- Improved performance by optimizing SQL queries
- Added comprehensive test coverage for pagination functionality

## Recent Changes

### Directory Structure Updates
- The `hash_algorithms` directory has been renamed to `algorithms` for simplicity and clarity.
- The `hash_validator.py` file has been renamed to `validator.py` to simplify the naming convention.

### Updated Imports
- All relevant import statements across the codebase have been updated to reflect the new structure and naming.

### Engine Refactor
- Removed the abstract `search_by_content` method from `SQLiteEngine` and `DuckDBEngine`.
- Integrated search functionality into the [search_by_string](cci:1://file:///mcard/model/card_collection.py:94:4-96:82) method, allowing searches across content, hash, and g_time fields.

### Event Generation
- Updated [generate_duplication_event](cci:1://file:///mcard/model/event_producer.py:38:0-54:28) and [generate_collision_event](cci:1://file:///mcard/model/event_producer.py:57:0-76:38) to return JSON strings.
- Enhanced event structure to include upgraded hash functions and content size.

### Logging
- Integrated logging into test cases for better traceability and debugging.

### MCard Class Update
- The [MCard](cci:2://file:///mcard/model/card.py:6:0-47:9) constructor now accepts a [hash_function](cci:1://file:///mcard/model/event_producer.py:8:0-23:16) parameter, providing more flexibility in hash generation.

### Tests
- Adjusted tests to verify the new event generation logic and ensure search functionality works as intended.

## Centralized Configuration Management

### Overview
MCard has adopted a centralized configuration management approach to improve maintainability, scalability, and readability. This involves consolidating all configuration constants into a single location, making it easier to manage and update configuration values across the application.

### Configuration Constants
All configuration constants are now defined in `config_constants.py`. This file contains named constants for various configuration values, including:

- Database schema and paths
- Hash algorithm constants and hierarchy
- Environment variable names
- API configuration
- HTTP status codes
- Error messages
- Event types and structure

### Benefits
Centralized configuration management provides several benefits, including:

- **Single Source of Truth**: All configuration constants are managed in one location.
- **Type Safety**: Constants are properly typed and documented.
- **Maintainability**: Changes to configuration values only need to be made in one place.
- **Code Completion**: IDE support for constant names improves developer productivity.
- **Documentation**: Each constant group is documented with its purpose and usage.
- **Testing**: Test files use the same constants as production code, ensuring consistency.

### Implementation
The `config_constants.py` file uses an enum-based approach for hash algorithms, ensuring type safety and readability. The file is organized into logical groups, making it easier to find and update specific configuration values.

### Example Usage
To use a configuration constant, simply import the `config_constants` module and access the desired constant. For example:
```python
from config_constants import HASH_ALGORITHM_SHA256

# Use the SHA-256 hash algorithm
hash_algorithm = HASH_ALGORITHM_SHA256

By adopting a centralized configuration management approach, MCard has improved its maintainability, scalability, and readability, making it easier to manage and update configuration values across the application.

Using MCardFromData for Stored Values

When retrieving stored MCard data from the database, always use the subclass MCardFromData. This approach allows you to bypass unnecessary and unwanted algorithms, significantly speeding up the MCard instantiation process.

Project Structure

MCard_TDD/
├── mcard/
│   ├── algorithms/          # Hash algorithm implementations
│   ├── engine/             # Database engines (SQLite, DuckDB)
│   ├── model/              # Core data models
│   ├── api.py             # FastAPI endpoints
│   └── logging_config.py   # Logging configuration
├── tests/
│   ├── persistence/       # Database persistence tests
│   └── unit/             # Unit tests
├── docs/                  # Project documentation
├── data/
│   ├── db/               # Database files
│   └── files/            # General files
└── logs/                 # Application logs

Configuration

Environment Setup

Create a .env file with the following variables:

MCARD_DB_PATH=data/db/mcard_demo.db
TEST_DB_PATH=data/db/test_mcard.db
MCARD_SERVICE_LOG_LEVEL=DEBUG

Development Guidelines

Using MCardFromData

When retrieving stored data, use MCardFromData instead of the base MCard class:

from mcard.model.card import MCardFromData

stored_card = MCardFromData(content=content, hash=hash, g_time=g_time)

Hash Algorithm Configuration

The default hash algorithm is SHA-256, but it's configurable:

from mcard.algorithms import HASH_ALGORITHM_SHA256

Installation

To set up the project, follow these steps:

  1. Create a virtual environment:

    python -m venv .venv
    
  2. Activate the virtual environment:

    • On macOS and Linux:
      source .venv/bin/activate
      
    • On Windows:
      .venv\Scripts\activate
      
  3. Configure your environment:

    • Copy .env.example to create your own .env file.
    • The default configuration uses:
      • Database path: data/db/mcard_demo.db.
      • Hash algorithm: SHA-256.
      • Connection pool size: 5.
      • Connection timeout: 30 seconds.

Directory Structure

  • mcard/
    • engine/: Contains the database engine implementations, including SQLite and DuckDB.
    • model/: Contains the core data models, including MCard.
    • tests/: Contains all test cases for the MCard library, ensuring functionality and correctness.

SQLite Persistence Testing

  • tests/persistence/sqlite_test.py: Contains test cases for SQLite persistence, ensuring data integrity and consistency.

The tests in @test_sqlite_persistence.py are designed to clear the database after each test function is run. This means that the test_mcard.db file will only contain the data from the last test executed. If the clear() function in the fixture is uncommented, it will remove the content of the last test as well. This behavior is intended to ensure that each test starts with a clean database, allowing for more accurate and reliable testing results.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcard-0.1.5.tar.gz (36.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcard-0.1.5-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file mcard-0.1.5.tar.gz.

File metadata

  • Download URL: mcard-0.1.5.tar.gz
  • Upload date:
  • Size: 36.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for mcard-0.1.5.tar.gz
Algorithm Hash digest
SHA256 369dd3bc1b3dad280f424eb6f75931f7e4a087bb0c8aa24b26b613f6437182ee
MD5 46a040b885d0da3121396cc1a97926ab
BLAKE2b-256 e16600117a6fb8e29a0305878bbba1227bd4d36569e06140a85e85a002b19b62

See more details on using hashes here.

File details

Details for the file mcard-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: mcard-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for mcard-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2ba1fe5e7ec7e5bb56d03aa7904784b819c810b6af3855d0d1e3664df851656b
MD5 c581e6d32c73af23b867a1b4a43705f3
BLAKE2b-256 285326c4143b55e045a9189d0f3af4d6d7557a3bfb1e5603a2ea451d744087fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page