MCard: Memory Card with TDD approach
Project description
MCard Core
A Python library implementing an algebraically closed data structure for content-addressable storage. MCard ensures that every piece of content in the system is uniquely identified by its hash and temporally ordered by its claim time, enabling robust content verification and precedence ordering. It allows for an Monadic approach to namespace management and content deduplication for any type of data.
Documentation
- Card Collection Guide: Detailed guide on MCard collection management and hash collision handling
- Global Time Design: Documentation on the global time (
g_time) implementation - Test-Driven Development Guide: Guide on our TDD approach and methodology
Core Concepts
MCard implements an algebraically closed system where:
- Every MCard is uniquely identified by its content hash (configurable, defaulting to SHA-256).
- Every MCard has an associated claim time (timezone-aware timestamp with microsecond precision).
- The database maintains these invariants automatically.
- Content integrity is guaranteed through immutable hashes.
- Temporal ordering is preserved at microsecond precision.
This design provides several key guarantees:
- Content Integrity: The content hash serves as both identifier and verification mechanism.
- Temporal Signature: All cards are associated with a timestamp:
g_time. - Precedence Verification: The claim time enables determination of content presentation order.
- Algebraic Closure: Any operation on MCards produces results that maintain these properties.
- Type Safety: Built on Pydantic with strict validation and type checking.
Required Attributes for Each MCard
Each MCard must have the following three required attributes:
1. content: The actual data being stored (string or bytes).
2. hash: A cryptographic hash of the content, using SHA-256 by default (configurable to other algorithms).
3. g_time: A timezone-aware timestamp with microsecond precision, representing the global time when the card was claimed.
Directory Structure
mcard/: Contains the main application code.tests/: Contains test files for the application.logs/: Contains log files generated by the application.data/db/: Directory for storing database files used by the application.data/files/: Directory reserved for storing general files used by the application.
Database Technologies
We will be using embedded database technologies, such as SQLite, DuckDB, and LanceDB initially, to provide efficient and reliable data storage solutions for MCard. These technologies are well-suited for handling the requirements of content-addressable storage and will allow for easy integration and management of data within the application.
API Endpoint
MCard can serve as an API endpoint for serving data content. By using FastAPI with Uvicorn, you can easily create and manage API routes for accessing and manipulating MCard data. FastAPI provides automatic interactive API documentation and is designed for high performance, making it an excellent choice for this project.
FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. Uvicorn is an ASGI server that allows you to serve the FastAPI application. To run the API server, use the following command:
uvicorn mcard.api:app --reload
This command will start the Uvicorn server with the FastAPI application defined in mcard/api.py, allowing you to access the MCard API at http://localhost:8000. You can also view the interactive API documentation at http://localhost:8000/docs.
PyTest Configuration
- The project uses PyTest for testing.
- Tests are located in the
testsdirectory. - The configuration file
pytest.inispecifies test paths and naming conventions.
Logging Configuration
- The project uses Python's built-in
loggingmodule for logging. - Logs are written to
logs/mcard.logwith a maximum size of 10MB and up to 5 backup files. - The logging format includes timestamps, log levels, and detailed information about the source of the log messages.
- The logging level is set to DEBUG for console output and INFO for file output.
- To initialize logging, call
setup_logging()frommcard.logging_configbefore running tests or application code.
Running Tests
To run tests:
pytest
To run tests with coverage:
pytest --cov=mcard
Hegel's Dialectic in Testing and CI/CD
Hegel's dialectic is a philosophical framework that describes the process of development and change through a triadic structure: thesis, antithesis, and synthesis. Here's how it relates to software testing and Continuous Integration/Continuous Deployment (CI/CD):
-
Thesis (Initial Code): Represents the initial code or feature implementation, the starting point where a developer writes code to fulfill a specific requirement or feature.
-
Antithesis (Testing and Bugs): Arises during the testing phase, where tests are executed. If tests fail or bugs are discovered, they represent a challenge to the initial implementation, highlighting discrepancies between intended functionality and actual behavior.
-
Synthesis (Refinement and Improvement): Occurs when developers address the issues identified during testing, leading to a refined version of the code that resolves conflicts between the initial implementation and testing outcomes.
CI/CD Integration
In a CI/CD pipeline, this dialectical process is continuous:
-
Continuous Integration: Developers frequently integrate code changes into a shared repository. Each integration triggers automated tests, allowing for rapid identification of issues against the current codebase.
-
Continuous Deployment: Once the code passes testing, it can be automatically deployed, representing the synthesis where refined code is made available to users.
This iterative process fosters continuous improvement, where each round of testing and deployment leads to better software quality and functionality. By applying Hegel's dialectic, teams can embrace the idea that conflict (in the form of bugs and failures) is a natural and necessary part of the development process, ultimately leading to a more robust and effective product.
Handling Duplicate Events
When a duplicate card is detected, the duplicate_event_card is assigned a new timestamp value. This ensures that even though the content is identical to the original card, the hash value will be unique due to the different timestamp. This mechanism allows for robust handling of duplicate content while maintaining the integrity of the system.
MD5 Collision Testing
The test suite includes verification of MD5 collision detection using known collision pairs from the FastColl attack. These pairs produce identical MD5 hashes despite having different content:
MD5 Collision Pair
Input 1:
4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa200a8284bf36e8e4b55b35f427593d849676da0d1555d8360fb5f07fea2
^^^ ^^^
Input 2:
4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa202a8284bf36e8e4b55b35f427593d849676da0d1d55d8360fb5f07fea2
^^^ ^^^
Key differences:
200vs202d15vsd1d
Both inputs produce the same MD5 hash value, demonstrating MD5's vulnerability to collision attacks. This is why MCard defaults to using more secure hash functions like SHA-256.
Testing Behavior
The current tests, particularly @test_sqlite_persistence.py, will always clear the database after one of the test functions is run. This means that test_mcard.db will only contain the data from the last test executed. If the clear() function in the fixture is uncommented, it will remove the content of the last test as well.
Core Dependencies
SQLAlchemy==1.4.47: SQL toolkit and ORMaiosqlite==0.17.0: Async SQLite database driverpython-dateutil==2.8.2: Date/time utilitiespython-dotenv==1.0.0: Environment management
Description
MCard is a project designed to facilitate card management with a focus on validation and logging features.
Installation
Using uv
You can install the MCard package from PyPI (once published):
uv pip install mcard
Installing from source
To install MCard directly from the source code:
# Clone the repository
git clone https://github.com/yourusername/MCard_TDD.git
cd MCard_TDD
# Install in development mode with uv
uv pip install -e .
# Install with development dependencies
uv pip install -e ".[dev]"
Development Environment Setup
- Set up a virtual environment using uv:
# Simply run the activate script which handles uv setup
source activate_venv.sh
This script will:
- Ensure conda is disabled (if present)
- Create a virtual environment using uv if it doesn't exist
- Activate the virtual environment
- Install dependencies from pyproject.toml using uv
Alternatively, you can manually set up the environment:
# Create and activate virtual environment with uv
uv venv .venv
source .venv/bin/activate
# Install dependencies with uv
uv pip sync pyproject.toml
Usage
After installation, you can use MCard in your Python code:
from mcard.model.card import MCard
from mcard.model.card_collection import CardCollection
# Create a new card
card = MCard(content="Hello, MCard!")
# Create a card collection
collection = CardCollection()
# Add the card to the collection
collection.add(card)
# Retrieve the card by its hash
retrieved_card = collection.get_by_hash(card.hash)
print(retrieved_card.content) # Outputs: Hello, MCard!
Or use the installed command-line entry point
mcard
## Recent Updates
### MCard Detail View Component
- Created a new component `mcard_detail_view.html` to display detailed information about MCards, including:
- Full hash string
- g_time string
- Content type
- Appropriate content display for images, videos, PDFs, and plain text.
### Dynamic Content Loading
- Implemented functionality to dynamically load and display card details when a card entry is clicked.
- Added JavaScript functions to handle click events and fetch card details from the server.
### Error Handling and Logging
- Enhanced error handling in the Flask backend to log errors and provide better feedback.
- Added detailed logging in the JavaScript to track the fetching and rendering process.
### Template Updates
- Updated existing templates to integrate the new detail view component and ensure proper rendering.
### User Experience Improvements
- Improved visual feedback for selected cards.
- Ensured that the focused area updates correctly without becoming blank.
### Configuration Management Refactoring (2024-12-18)
- Renamed `EnvConfig` to `EnvParameters` for better clarity and consistency
- Moved configuration management from `env_config.py` to `env_parameters.py`
- Updated all references to use the new class name across the codebase
- Enhanced test coverage for configuration parameters
- Maintained singleton pattern for configuration management
- Ensured backward compatibility with existing environment variable handling
### Database Enhancements
- Implemented `get_all()` method in SQLiteEngine for efficient pagination
- Added support for page size and page number parameters
- Enhanced error handling for invalid pagination parameters
- Improved performance by optimizing SQL queries
- Added comprehensive test coverage for pagination functionality
## Recent Changes
### Directory Structure Updates
- The `hash_algorithms` directory has been renamed to `algorithms` for simplicity and clarity.
- The `hash_validator.py` file has been renamed to `validator.py` to simplify the naming convention.
### Updated Imports
- All relevant import statements across the codebase have been updated to reflect the new structure and naming.
### Engine Refactor
- Removed the abstract `search_by_content` method from `SQLiteEngine` and `DuckDBEngine`.
- Integrated search functionality into the [search_by_string](cci:1://file:///mcard/model/card_collection.py:94:4-96:82) method, allowing searches across content, hash, and g_time fields.
### Event Generation
- Updated [generate_duplication_event](cci:1://file:///mcard/model/event_producer.py:38:0-54:28) and [generate_collision_event](cci:1://file:///mcard/model/event_producer.py:57:0-76:38) to return JSON strings.
- Enhanced event structure to include upgraded hash functions and content size.
### Logging
- Integrated logging into test cases for better traceability and debugging.
### MCard Class Update
- The [MCard](cci:2://file:///mcard/model/card.py:6:0-47:9) constructor now accepts a [hash_function](cci:1://file:///mcard/model/event_producer.py:8:0-23:16) parameter, providing more flexibility in hash generation.
### Tests
- Adjusted tests to verify the new event generation logic and ensure search functionality works as intended.
## Centralized Configuration Management
### Overview
MCard has adopted a centralized configuration management approach to improve maintainability, scalability, and readability. This involves consolidating all configuration constants into a single location, making it easier to manage and update configuration values across the application.
### Configuration Constants
All configuration constants are now defined in `config_constants.py`. This file contains named constants for various configuration values, including:
- Database schema and paths
- Hash algorithm constants and hierarchy
- Environment variable names
- API configuration
- HTTP status codes
- Error messages
- Event types and structure
### Benefits
Centralized configuration management provides several benefits, including:
- **Single Source of Truth**: All configuration constants are managed in one location.
- **Type Safety**: Constants are properly typed and documented.
- **Maintainability**: Changes to configuration values only need to be made in one place.
- **Code Completion**: IDE support for constant names improves developer productivity.
- **Documentation**: Each constant group is documented with its purpose and usage.
- **Testing**: Test files use the same constants as production code, ensuring consistency.
### Implementation
The `config_constants.py` file uses an enum-based approach for hash algorithms, ensuring type safety and readability. The file is organized into logical groups, making it easier to find and update specific configuration values.
### Example Usage
To use a configuration constant, simply import the `config_constants` module and access the desired constant. For example:
```python
from config_constants import HASH_ALGORITHM_SHA256
# Use the SHA-256 hash algorithm
hash_algorithm = HASH_ALGORITHM_SHA256
By adopting a centralized configuration management approach, MCard has improved its maintainability, scalability, and readability, making it easier to manage and update configuration values across the application.
Using MCardFromData for Stored Values
When retrieving stored MCard data from the database, always use the subclass MCardFromData. This approach allows you to bypass unnecessary and unwanted algorithms, significantly speeding up the MCard instantiation process.
Project Structure
MCard_TDD/
├── mcard/
│ ├── algorithms/ # Hash algorithm implementations
│ ├── engine/ # Database engines (SQLite, DuckDB)
│ ├── model/ # Core data models
│ ├── api.py # FastAPI endpoints
│ └── logging_config.py # Logging configuration
├── tests/
│ ├── persistence/ # Database persistence tests
│ └── unit/ # Unit tests
├── docs/ # Project documentation
├── data/
│ ├── db/ # Database files
│ └── files/ # General files
└── logs/ # Application logs
Configuration
Environment Setup
Create a .env file with the following variables:
MCARD_DB_PATH=data/db/mcard_demo.db
TEST_DB_PATH=data/db/test_mcard.db
MCARD_SERVICE_LOG_LEVEL=DEBUG
Development Guidelines
Using MCardFromData
When retrieving stored data, use MCardFromData instead of the base MCard class:
from mcard.model.card import MCardFromData
stored_card = MCardFromData(content=content, hash=hash, g_time=g_time)
Hash Algorithm Configuration
The default hash algorithm is SHA-256, but it's configurable:
from mcard.algorithms import HASH_ALGORITHM_SHA256
Installation
To set up the project, follow these steps:
-
Create a virtual environment:
python -m venv .venv
-
Activate the virtual environment:
- On macOS and Linux:
source .venv/bin/activate
- On Windows:
.venv\Scripts\activate
- On macOS and Linux:
-
Configure your environment:
- Copy
.env.exampleto create your own.envfile. - The default configuration uses:
- Database path:
data/db/mcard_demo.db. - Hash algorithm: SHA-256.
- Connection pool size: 5.
- Connection timeout: 30 seconds.
- Database path:
- Copy
Directory Structure
- mcard/
- engine/: Contains the database engine implementations, including SQLite and DuckDB.
- model/: Contains the core data models, including
MCard. - tests/: Contains all test cases for the MCard library, ensuring functionality and correctness.
SQLite Persistence Testing
- tests/persistence/sqlite_test.py: Contains test cases for SQLite persistence, ensuring data integrity and consistency.
The tests in @test_sqlite_persistence.py are designed to clear the database after each test function is run. This means that the test_mcard.db file will only contain the data from the last test executed. If the clear() function in the fixture is uncommented, it will remove the content of the last test as well. This behavior is intended to ensure that each test starts with a clean database, allowing for more accurate and reliable testing results.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcard-0.1.3.tar.gz.
File metadata
- Download URL: mcard-0.1.3.tar.gz
- Upload date:
- Size: 21.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10cff33ca0f50769d148fbaa3c6f3a7c33ac4cfa2548d5a3b5db38117fd00afe
|
|
| MD5 |
58123d4fd2ba95e60a8b6ba39c8e851a
|
|
| BLAKE2b-256 |
842cebb83a18ad9ddae13887e360fec763d9c68c1622ab6c0f2556709eb16cfc
|
File details
Details for the file mcard-0.1.3-py3-none-any.whl.
File metadata
- Download URL: mcard-0.1.3-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf9d851b08bcb1c729f108aba827793256f5515f5594ba84a404d05d9312afcd
|
|
| MD5 |
6a15bb1991f6d8a24e7481aa487be360
|
|
| BLAKE2b-256 |
8febbf1cffa56f5d7a899e1138f230c49d10187d70dc5a62fda25a52ec192bd6
|