A semantic database for auditable data with append-only corrections and full traceability
Project description
YetAnotherSchema
Version: 0.1.1 MVP
Philosophy: "Values are free ions, not tables. Everything else is a view."
A semantic database for auditable data with append-only corrections, full traceability, and on-demand views.
๐ฏ What is YetAnotherSchema?
YetAnotherSchema is a database system designed for domains where data integrity, auditability, and traceability are paramount (clinical research, financial trading, IoT monitoring, etc.).
The Problem with Traditional Databases
Example 1: Employee Database
In traditional databases, values collected once are duplicated everywhere they're needed:
Traditional Database:
payroll table:
emp_id | birth_date | salary
E001 | 1990-05-15 | 75000
benefits table:
emp_id | birth_date | plan
E001 | 1990-05-15 | Premium
reviews table:
emp_id | birth_date | rating | year
E001 | 1990-05-15 | 4.5 | 2024
E001 | 1990-05-15 | 4.7 | 2025
Issues:
- Birth date collected once at hiring, but duplicated in 4+ places
- If birth date was entered incorrectly, you'd need to update multiple tables
- Data integrity issues (what if payroll has 1990-05-15 but benefits has 1990-05-16?)
- Massive storage waste when values appear in hundreds of contexts
Example 2: Product Catalog
Traditional Database:
orders table:
order_id | product_id | product_weight | quantity
ORD001 | PROD123 | 2.5 kg | 1
ORD002 | PROD123 | 2.5 kg | 2
ORD003 | PROD123 | 2.5 kg | 1
# Product weight duplicated for every order!
Issues:
- Product weight measured once, but duplicated for every order
- If weight was measured incorrectly, you'd need to update thousands of order records
- No history of when the weight was corrected
The YetAnotherSchema Solution
Values are stored once, contexts map to them:
Example 1: Employee Database
values table:
value_id | field | raw_value | created_timestamp
1 | birth_date | 1990-05-15 | 2020-01-10 09:00
2 | salary | 75000 | 2020-01-10 09:05
3 | salary | 80000 | 2024-01-15 10:00
context_relationships table:
context_key | field | value_id
emp_id=E001,context=payroll| birth_date | 1 # Same birth_date!
emp_id=E001,context=payroll| salary | 3 # Latest salary
emp_id=E001,context=benefits| birth_date| 1 # Same birth_date!
emp_id=E001,context=review_2024| birth_date| 1 # Same birth_date!
emp_id=E001,context=review_2025| birth_date| 1 # Same birth_date!
Example 2: Product Catalog
values table:
value_id | field | raw_value | created_timestamp
1 | weight | 2.5 | 2025-01-01 08:00
2 | price | 49.99 | 2025-01-01 08:05
3 | price | 44.99 | 2025-01-15 10:00 # Price change
context_relationships table:
context_key | field | value_id
product_id=PROD123,order=ORD001| weight | 1 # Same weight!
product_id=PROD123,order=ORD001| price | 2
product_id=PROD123,order=ORD002| weight | 1 # Same weight!
product_id=PROD123,order=ORD002| price | 2
product_id=PROD123,order=ORD003| weight | 1 # Same weight!
product_id=PROD123,order=ORD003| price | 3 # New price
Benefits:
- โ Values collected once, stored once, referenced many times
- โ Birth date stored once, appears in 4+ contexts (no duplication)
- โ Product weight stored once, appears in 1000+ orders (no duplication)
- โ Corrections create new values (append-only)
- โ Full audit trail preserved
- โ Single source of truth
- โ Massive storage savings
๐งฌ Core Philosophy
"Values are free ions, not tables. Everything else is a view."
This philosophy means:
- Values are atomic and independent - Like ions in chemistry, values exist independently and can bond with different contexts
- No forced structure - Values aren't locked into rigid table structures
- Context creates meaning - The same value can exist in multiple contexts
- Views are computed on-demand - Traditional tables are materialized when needed, never stored
- Append-only corrections - Corrections create new values, preserving history
- Full traceability - Every value knows when it was created and by whom
Key Principles
- Values are atomic - No duplication, stored once
- Append-only - Corrections create new entries, never delete
- Context-driven - Values exist in contexts (subject, visit, site, etc.)
- On-demand views - Traditional tables are computed when needed
- Full audit trail - Every change is tracked with timestamps
- Metadata is informational - No validation, user can enter anything (frontend validates)
- User metadata wins - Enables unit harmonization (cm โ inches)
Three-Sphere Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ YetAnotherSchema โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Outer Sphere: Expressions (Computed) โ โ
โ โ - BMI, derived metrics โ โ
โ โ - Polars expressions โ โ
โ โ - Never materialized โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โฒ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Middle Sphere: Relationships (DuckDB) โ โ
โ โ - context_relationships table โ โ
โ โ - Indexed for fast lookups โ โ
โ โ - Maps contexts โ value indices โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โฒ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Inner Sphere: Values (DuckDB) โ โ
โ โ - values table (append-only) โ โ
โ โ - Columnar storage โ โ
โ โ - Full audit trail โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Real-World Examples
Example 1: Employee Database - No Duplication
An employee's birth date is collected once at hiring, but needs to appear in payroll, benefits, and performance reviews.
Traditional Database:
-- Birth date duplicated in every table
INSERT INTO payroll VALUES ('E001', '1990-05-15', 75000);
INSERT INTO benefits VALUES ('E001', '1990-05-15', 'Premium');
INSERT INTO reviews VALUES ('E001', '1990-05-15', 4.5, 2024);
INSERT INTO reviews VALUES ('E001', '1990-05-15', 4.7, 2025);
-- Problem: Birth date stored 4 times! If it was wrong, update 4 places.
YetAnotherSchema:
# Birth date stored once
ys.write(
space_name='hr_system',
field='birth_date',
value={'raw_value': '1990-05-15', 'user_id': 'hr_admin'},
contexts={'emp_id': 'E001'}
)
# Reference birth date in payroll context
ys.write(
space_name='hr_system',
field='salary',
value={'raw_value': 75000, 'user_id': 'payroll_admin'},
contexts={'emp_id': 'E001', 'context': 'payroll'}
)
# Reference birth date in benefits context
ys.write(
space_name='hr_system',
field='plan',
value={'raw_value': 'Premium', 'user_id': 'benefits_admin'},
contexts={'emp_id': 'E001', 'context': 'benefits'}
)
# Reference birth date in review contexts
ys.write(
space_name='hr_system',
field='rating',
value={'raw_value': 4.5, 'user_id': 'manager_001'},
contexts={'emp_id': 'E001', 'context': 'review', 'year': '2024'}
)
ys.write(
space_name='hr_system',
field='rating',
value={'raw_value': 4.7, 'user_id': 'manager_001'},
contexts={'emp_id': 'E001', 'context': 'review', 'year': '2025'}
)
# Query payroll view (birth_date appears automatically)
df = ys.read(
space_name='hr_system',
fields=['emp_id', 'birth_date', 'salary'],
contexts={'context': 'payroll'}
)
# โโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโ
# โ emp_id โ birth_date โ salary โ
# โโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโค
# โ E001 โ 1990-05-15 โ 75000 โ
# โโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโ
# Query reviews view (birth_date appears automatically)
df = ys.read(
space_name='hr_system',
fields=['emp_id', 'birth_date', 'rating', 'year'],
contexts={'context': 'review'}
)
# โโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโฌโโโโโโโ
# โ emp_id โ birth_date โ rating โ year โ
# โโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโผโโโโโโโค
# โ E001 โ 1990-05-15 โ 4.5 โ 2024 โ
# โ E001 โ 1990-05-15 โ 4.7 โ 2025 โ
# โโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโ
Benefits:
- โ Birth date stored once, appears in 4+ contexts
- โ If birth date was wrong, correct it once (not 4 times)
- โ Full audit trail of correction
- โ No duplication, no data integrity issues
Example 2: Product Catalog - Massive Savings
A product's weight is measured once, but appears in thousands of orders.
Traditional Database:
-- Product weight duplicated for every order
INSERT INTO orders VALUES ('ORD001', 'PROD123', 2.5, 1);
INSERT INTO orders VALUES ('ORD002', 'PROD123', 2.5, 2);
INSERT INTO orders VALUES ('ORD003', 'PROD123', 2.5, 1);
-- ... 1000 more orders ...
-- Problem: Weight stored 1000+ times! If it was wrong, update 1000+ rows.
YetAnotherSchema:
# Product weight stored once
ys.write(
space_name='ecommerce',
field='weight',
value={'raw_value': 2.5, 'metadata': {'unit': 'kg'}, 'user_id': 'warehouse_admin'},
contexts={'product_id': 'PROD123'}
)
# Create orders (weight is referenced, not duplicated)
ys.write(
space_name='ecommerce',
field='quantity',
value={'raw_value': 1, 'user_id': 'customer_001'},
contexts={'product_id': 'PROD123', 'order_id': 'ORD001'}
)
ys.write(
space_name='ecommerce',
field='quantity',
value={'raw_value': 2, 'user_id': 'customer_002'},
contexts={'product_id': 'PROD123', 'order_id': 'ORD002'}
)
ys.write(
space_name='ecommerce',
field='quantity',
value={'raw_value': 1, 'user_id': 'customer_003'},
contexts={'product_id': 'PROD123', 'order_id': 'ORD003'}
)
# Query orders (weight appears automatically for each order)
df = ys.read(
space_name='ecommerce',
fields=['order_id', 'product_id', 'weight', 'quantity'],
contexts={'product_id': 'PROD123'}
)
# โโโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโโ
# โ order_id โ product_id โ weight โ quantity โ
# โโโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโผโโโโโโโโโโโค
# โ ORD001 โ PROD123 โ 2.5 โ 1 โ
# โ ORD002 โ PROD123 โ 2.5 โ 2 โ
# โ ORD003 โ PROD123 โ 2.5 โ 1 โ
# โโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโ
# If weight was measured incorrectly, correct it once
ys.write(
space_name='ecommerce',
field='weight',
value={'raw_value': 2.3, 'metadata': {'unit': 'kg', 'note': 'remeasured'}, 'user_id': 'warehouse_admin'},
contexts={'product_id': 'PROD123'}
)
# All orders now show corrected weight automatically
df = ys.read(
space_name='ecommerce',
fields=['order_id', 'product_id', 'weight', 'quantity'],
contexts={'product_id': 'PROD123'}
)
# โโโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโโ
# โ order_id โ product_id โ weight โ quantity โ
# โโโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโผโโโโโโโโโโโค
# โ ORD001 โ PROD123 โ 2.3 โ 1 โ # Updated!
# โ ORD002 โ PROD123 โ 2.3 โ 2 โ # Updated!
# โ ORD003 โ PROD123 โ 2.3 โ 1 โ # Updated!
# โโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโ
Benefits:
- โ Weight stored once, appears in 1000+ orders
- โ Correction updates all orders automatically
- โ Full audit trail (both 2.5 and 2.3 preserved)
- โ Massive storage savings (1 value vs 1000+ duplicates)
Example 3: Data Correction with Audit Trail
A nurse enters a patient's weight as 500 kg (typo!). Later, they realize the mistake and correct it to 50 kg.
Traditional Database:
-- Original entry
INSERT INTO measurements VALUES ('S001', 'V1', 500);
-- Correction (overwrites original)
UPDATE measurements SET weight = 50 WHERE subject_id = 'S001' AND visit = 'V1';
-- Problem: Original value (500) is lost! No audit trail.
YetAnotherSchema:
# Original entry (typo)
ys.write(
space_name='clinical_study',
field='weight',
value={'raw_value': 500, 'user_id': 'nurse_001'},
contexts={'subject_id': 'S001', 'visit': 'V1'}
)
# created_timestamp: 2026-01-30T09:00:00
# Correction (new entry)
ys.write(
space_name='clinical_study',
field='weight',
value={'raw_value': 50, 'user_id': 'nurse_001'},
contexts={'subject_id': 'S001', 'visit': 'V1'}
)
# created_timestamp: 2026-01-30T09:01:00
# Read returns latest value (50)
df = ys.read(
space_name='clinical_study',
fields=['subject_id', 'weight'],
contexts={'subject_id': 'S001'}
)
# weight = 50.0
# Time-travel to see original value (for audit)
df = ys.read(
space_name='clinical_study',
fields=['subject_id', 'weight'],
contexts={'subject_id': 'S001'},
as_of='2026-01-30T09:00:30' # Before correction
)
# weight = 500.0
Benefits:
- โ Both values preserved (500 and 50)
- โ Full audit trail (who, when, what)
- โ Latest value returned by default
- โ Time-travel for audits
- โ No data loss
๐ Quick Start
Installation
pip install duckdb polars toml python-dateutil
Basic Usage
import yetanotherschema as ys
# 1. Create a space
ys.create(
space_name='clinical_study_001',
schema_path='schema.toml'
)
# 2. Write a value
ys.write(
space_name='clinical_study_001',
field='height',
value={
'raw_value': 170,
'metadata': {'unit': 'cm'},
'user_id': 'nurse_001'
},
contexts={'subject_id': 'S001', 'visit': 'V1'}
)
# 3. Read values
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'visit', 'height', 'weight']
)
print(df)
๐ API Reference
1. create() - Create a Space
ys.create(
space_name='clinical_study_001',
schema_path='schema.toml'
)
2. write() - Write a Value
ys.write(
space_name='clinical_study_001',
field='height',
value={
'raw_value': 170,
'metadata': {'unit': 'cm', 'device': 'stadiometer'},
'reported_timestamp': '2026-01-30T09:00:00',
'user_id': 'nurse_001'
},
contexts={'subject_id': 'S001', 'visit': 'V1'}
)
3. write_batch() - Write Multiple Values
ys.write_batch(
space_name='clinical_study_001',
records=[
{
'field': 'height',
'value': {'raw_value': 170, 'user_id': 'nurse_001'},
'contexts': {'subject_id': 'S001', 'visit': 'V1'}
},
{
'field': 'weight',
'value': {'raw_value': 70, 'user_id': 'nurse_001'},
'contexts': {'subject_id': 'S001', 'visit': 'V1'}
}
]
)
4. read() - Read Values
# Basic read
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'visit', 'height', 'weight']
)
# With context filter
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'height', 'weight'],
contexts={'subject_id': 'S001'}
)
# With value filters
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'height'],
filters={'height': {'gt': 160, 'lt': 180}}
)
# Time-travel query
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'weight'],
as_of='2026-01-15T23:59:59'
)
5. schema() - View Schema
# Summary
schema_info = ys.schema('clinical_study_001')
# Statistics
stats = ys.schema('clinical_study_001', detail='stats')
# Full schema
full_schema = ys.schema('clinical_study_001', detail='full')
6. list_spaces() - List All Spaces
spaces = ys.list_spaces()
for space in spaces:
print(f"{space['space_name']}: {space['total_values']} values")
๐ Schema Definition
Create a schema.toml file:
[metadata]
name = "clinical_study_001"
version = "0.1.0"
[contexts]
# At least one required context
required = ["subject_id"]
optional = ["visit", "site", "country"]
[hierarchies]
geographic = ["region", "country", "site"]
lab = ["lab_category", "lab_subcategory", "lab_test"]
[values]
height = { unit = "cm", type = "numeric", category = "vital_sign" }
weight = { unit = "kg", type = "numeric", category = "vital_sign" }
hemoglobin = { unit = "g/dL", type = "numeric", category = "lab_result" }
[null_values]
not_done = "@notdone"
missing = "@null"
๐ Comprehensive Examples
Example 1: Basic Workflow
import yetanotherschema as ys
# Create a space
ys.create(
space_name='clinical_study_001',
schema_path='schema.toml'
)
# Write a value
ys.write(
space_name='clinical_study_001',
field='height',
value={
'raw_value': 170,
'metadata': {'unit': 'cm', 'device': 'stadiometer'},
'reported_timestamp': '2026-01-30T09:00:00',
'user_id': 'nurse_001'
},
contexts={'subject_id': 'S001', 'visit': 'V1'}
)
# Read values
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'visit', 'height']
)
print(df)
Example 2: Batch Operations
# Write multiple values at once
value_ids = ys.write_batch(
space_name='clinical_study_001',
records=[
{
'field': 'height',
'value': {'raw_value': 170, 'user_id': 'nurse_001'},
'contexts': {'subject_id': 'S001', 'visit': 'V1'}
},
{
'field': 'weight',
'value': {'raw_value': 70, 'user_id': 'nurse_001'},
'contexts': {'subject_id': 'S001', 'visit': 'V1'}
},
{
'field': 'temperature',
'value': {'raw_value': 37.0, 'user_id': 'nurse_001'},
'contexts': {'subject_id': 'S001', 'visit': 'V1'}
}
]
)
print(f"Written {len(value_ids)} values")
Example 3: Context Filtering
# Get all visits for a specific subject
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'visit', 'height', 'weight'],
contexts={'subject_id': 'S001'}
)
# Get specific visit
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'visit', 'height', 'weight'],
contexts={'subject_id': 'S001', 'visit': 'V1'}
)
# Get all subjects from a specific site
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'site', 'height', 'weight'],
contexts={'site': 'Boston'}
)
Example 4: Value Filtering
# Numeric filters
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'height'],
filters={'height': {'gt': 160, 'lt': 180}}
)
# Multiple conditions
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'height', 'weight'],
filters={
'height': {'gte': 165, 'lte': 175},
'weight': {'gte': 50, 'lte': 100}
}
)
# Text filtering
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'adverse_event'],
filters={'adverse_event': {'contains': 'headache'}}
)
Example 5: Data Corrections
# Original entry (typo)
ys.write(
space_name='clinical_study_001',
field='weight',
value={
'raw_value': 500, # Typo!
'metadata': {'unit': 'kg', 'note': 'data_entry_error'},
'reported_timestamp': '2026-01-30T09:00:00',
'user_id': 'nurse_001'
},
contexts={'subject_id': 'S001', 'visit': 'V1'}
)
# Correction (new entry)
ys.write(
space_name='clinical_study_001',
field='weight',
value={
'raw_value': 50, # Corrected
'metadata': {'unit': 'kg', 'note': 'corrected_typo'},
'reported_timestamp': '2026-01-30T09:00:00', # Same reported time
'user_id': 'nurse_001'
},
contexts={'subject_id': 'S001', 'visit': 'V1'}
)
# Read returns latest value (50)
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'weight'],
contexts={'subject_id': 'S001'}
)
# weight = 50.0
Example 6: Time-Travel Queries
# View data as it existed on a specific date
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'weight'],
contexts={'subject_id': 'S001'},
as_of='2026-01-30T09:00:30' # Before correction
)
# weight = 500.0 (shows typo)
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'weight'],
contexts={'subject_id': 'S001'},
as_of='2026-01-30T09:01:30' # After correction
)
# weight = 50.0 (shows corrected value)
Example 7: Metadata Inclusion
# Read with metadata
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'height', 'weight'],
contexts={'subject_id': 'S001'},
include_metadata=True
)
# DataFrame includes metadata column with all metadata for each field
print(df)
# โโโโโโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# โ subject_id โ height โ weight โ metadata โ
# โ --- โ --- โ --- โ --- โ
# โ str โ f64 โ f64 โ struct โ
# โโโโโโโโโโโโโโชโโโโโโโโโชโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโก
# โ S001 โ 170.0 โ 70.0 โ {height: {unit: cm, ...}, โ
# โ โ โ โ weight: {unit: kg, ...}} โ
# โโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Example 8: Schema Inspection
# Summary view
schema_info = ys.schema('clinical_study_001')
print(schema_info)
# {
# 'space_name': 'clinical_study_001',
# 'schema_version': '0.1.0',
# 'contexts': {
# 'required': ['subject_id'],
# 'optional': ['visit', 'site']
# },
# 'value_fields': ['height', 'weight', 'temperature']
# }
# Statistics
stats = ys.schema('clinical_study_001', detail='stats')
print(stats)
# {
# 'total_values': 1500,
# 'unique_contexts': 250,
# 'value_fields': 12
# }
# Full schema
full_schema = ys.schema('clinical_study_001', detail='full')
Example 9: Multi-Site Study
# Collect data from multiple sites
ys.write(
space_name='global_study',
field='height',
value={'raw_value': 170, 'user_id': 'nurse_boston_001'},
contexts={'subject_id': 'S001', 'visit': 'V1', 'site': 'Boston'}
)
ys.write(
space_name='global_study',
field='height',
value={'raw_value': 165, 'user_id': 'nurse_ny_001'},
contexts={'subject_id': 'S002', 'visit': 'V1', 'site': 'New York'}
)
# Query by site
df_boston = ys.read(
space_name='global_study',
fields=['subject_id', 'site', 'height'],
contexts={'site': 'Boston'}
)
# Query all sites
df_all = ys.read(
space_name='global_study',
fields=['subject_id', 'site', 'height']
)
Example 10: Unit Harmonization
# User enters height in inches (overrides schema default of cm)
ys.write(
space_name='clinical_study_001',
field='height',
value={
'raw_value': 67, # inches
'metadata': {
'unit': 'inches', # User overrides schema (cm)
'device': 'tape_measure'
},
'user_id': 'nurse_001'
},
contexts={'subject_id': 'S001', 'visit': 'V1'}
)
# Later, use Additory or custom logic to harmonize units
# This is why user metadata wins - enables unit conversion
๐ง Data Corrections
YetAnotherSchema uses append-only corrections:
# Original entry (typo)
ys.write(
space_name='clinical_study_001',
field='weight',
value={'raw_value': 500, 'user_id': 'nurse_001'}, # Typo!
contexts={'subject_id': 'S001', 'visit': 'V1'}
)
# created_timestamp: 2026-01-30T09:00:00
# Correction (new entry)
ys.write(
space_name='clinical_study_001',
field='weight',
value={'raw_value': 50, 'user_id': 'nurse_001'}, # Corrected
contexts={'subject_id': 'S001', 'visit': 'V1'}
)
# created_timestamp: 2026-01-30T09:01:00
# Read returns latest value (50)
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'weight'],
contexts={'subject_id': 'S001'}
)
# weight = 50.0
# Time-travel to see original value
df = ys.read(
space_name='clinical_study_001',
fields=['subject_id', 'weight'],
contexts={'subject_id': 'S001'},
as_of='2026-01-30T09:00:30' # Before correction
)
# weight = 500.0
๐ฏ Use Cases
Clinical Research
- Multi-site trials
- Regulatory compliance
- Data corrections with audit trail
- Time-travel queries for audits
Financial Trading
- Trade history
- Price corrections
- Regulatory reporting
- Audit trails
IoT Monitoring
- Sensor readings
- Device calibrations
- Historical analysis
- Anomaly detection
Manufacturing
- Quality control
- Batch tracking
- Equipment monitoring
- Compliance reporting
๐งช Testing
Run the test suite:
# Basic tests
python test_basic.py
# Advanced tests
python test_advanced.py
# Example usage
python example_usage.py
๐ Performance
Current MVP performance (DuckDB + Polars):
- Write: < 1ms per value
- Batch write: < 100ms for 1000 values
- Simple query: < 10ms
- Complex query: < 50ms
- Storage: Columnar compression
๐ฎ Future Vision
Sub-Spaces for Data Residency
Enable distributed data storage while maintaining centralized relationships:
# Main space (corporate HQ - USA)
ys.create(
space_name='global_study',
schema_path='schema.toml'
)
# Sub-space for EU data (values only, no relationships)
ys.create_subspace(
parent_space='global_study',
subspace_name='global_study_eu',
storage_path='/eu_datacenter/global_study_eu.db',
mode='values_only' # Only value store, no relationships
)
# Sub-space for Asia data
ys.create_subspace(
parent_space='global_study',
subspace_name='global_study_asia',
storage_path='/asia_datacenter/global_study_asia.db',
mode='values_only'
)
# Write to EU sub-space (data stays in EU)
ys.write(
space_name='global_study_eu',
field='height',
value={...},
contexts={'subject_id': 'EU_S001', 'country': 'Germany'}
)
# Read from main space (combines all sub-spaces + relationships)
df = ys.read(
space_name='global_study', # Automatically queries all sub-spaces
fields=['subject_id', 'country', 'height', 'weight']
)
# Result includes data from USA, EU, Asia sub-spaces
# Relationships and expressions applied at main space level
Use Cases:
- GDPR compliance (EU data stays in EU)
- Data sovereignty (country-specific regulations)
- Performance (data closer to users)
- Disaster recovery (distributed backups)
Authentication System
Pattern-based authentication with audit trail:
# Every write includes auth_key
ys.write(
space_name='clinical_study',
field='height',
value={
'raw_value': 170,
'user_id': 'nurse_001',
'auth_key': 'ys_v1_user_nurse001_ts_20260130_counter_00042_sig_a3f9d2'
},
contexts={'subject_id': 'S001'}
)
# Auth key structure:
# ys_v1_user_{user_id}_ts_{timestamp}_counter_{counter}_sig_{signature}
#
# - version: v1
# - user_id: nurse001
# - timestamp: 20260130 (date)
# - counter: 00042 (increments with each write)
# - signature: cryptographic signature
# Internal counter tracking
# Every write with this user_id increments counter
# Enables audit: "nurse_001 made 42 writes today"
# Read operations also audited
df = ys.read(
space_name='clinical_study',
fields=['subject_id', 'height', 'weight'],
auth_key='ys_v1_user_analyst001_ts_20260130_counter_00015_sig_b7e4c1'
)
# System logs: "analyst001 read height/weight data at 2026-01-30 10:30:00"
Features:
- Pattern-based key generation
- Incremental counters for audit
- Read/write audit trail
- No open/close connection workflow
- Cryptographic signatures
Schema Evolution
Modify schema without breaking existing data:
# Add new field to existing space
ys.create(
space_name='clinical_study',
mode='modify',
add_fields={
'blood_pressure': {
'unit': 'mmHg',
'type': 'numeric',
'category': 'vital_sign'
}
}
)
# Remove field (flag as hidden, don't delete data)
ys.create(
space_name='clinical_study',
mode='modify',
hide_fields=['old_field']
)
Use Cases:
- Protocol amendments in clinical trials
- Evolving data requirements
- Backward compatibility
- Gradual migrations
Event-Based Time-Travel
Query data as of a specific event, not just timestamp:
# Time-travel using reported_timestamp (event time)
df = ys.read(
space_name='clinical_study',
fields=['subject_id', 'weight'],
as_of_event='2026-01-15', # Uses reported_timestamp
contexts={'subject_id': 'S001'}
)
# vs. created_timestamp (system time)
df = ys.read(
space_name='clinical_study',
fields=['subject_id', 'weight'],
as_of='2026-01-15T23:59:59', # Uses created_timestamp
contexts={'subject_id': 'S001'}
)
Use Cases:
- "What was the patient's weight on visit day?"
- "Show me data as of the database lock date"
- Event-driven queries vs. system-driven queries
Missing Value Strategies
Handle missing data with semantic meaning:
# Schema defines missing value codes
[null_values]
not_done = "@notdone" # Test not performed
missing = "@null" # Data missing
refused = "@refused" # Patient refused
not_applicable = "@na" # Not applicable
# Write with missing value code
ys.write(
space_name='clinical_study',
field='blood_pressure',
value={
'raw_value': '@notdone',
'metadata': {'reason': 'equipment_malfunction'},
'user_id': 'nurse_001'
},
contexts={'subject_id': 'S001', 'visit': 'V1'}
)
# Query distinguishes between different types of missing
df = ys.read(
space_name='clinical_study',
fields=['subject_id', 'blood_pressure'],
filters={'blood_pressure': {'eq': '@notdone'}}
)
๐ฃ๏ธ Roadmap
v0.1.1 (Current - MVP) โ
- โ Basic CRUD operations
- โ Append-only corrections
- โ Context filtering
- โ Value filtering
- โ Time-travel queries (created_timestamp)
- โ DuckDB + Polars backend
- โ Schema validation
- โ Batch operations
- โ Complete dependency management
Future
- Expression engine (BMI, derived metrics)
- Hierarchical queries (site โ country โ region)
- Performance optimizations
- Query caching
- Batch operations optimization
- More comprehensive tests
- Authentication system (auth_key)
- Read audit trail
- Sub-spaces (data residency)
- Schema evolution
- Event-based time-travel (reported_timestamp)
- Missing value strategies
- Blinded/unblinded separation
- Multi-user access control
- Advanced filtering (OR, AND, NOT)
- Aggregation functions
- Export to traditional formats
Current Status: MVP (v0.1.1) - Minimum Viable Product ready for use
Ready For:
- Development and testing
- Small to medium datasets
- Proof of concept projects
- Learning and experimentation
Not Ready For:
- Large-scale production use (optimization pending)
- Mission-critical systems (more testing needed)
- High-performance requirements (caching not yet implemented)
๐ License
MIT License
Philosophy: "Values are free ions, not tables. Everything else is a view."
This is not just a database - it's a different way of thinking about data.
Philosophy: "Values are free ions, not tables. Everything else is a view."
Status: MVP
Next: Expression engine and hierarchical queries
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yetanotherschema-0.1.1.tar.gz.
File metadata
- Download URL: yetanotherschema-0.1.1.tar.gz
- Upload date:
- Size: 35.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5439a112a7012564ec09ebdca21b87da613f729b115a5d23490dd79ffa51f3d4
|
|
| MD5 |
25980483d9d151a3216155e50e0cd0fa
|
|
| BLAKE2b-256 |
f733154b3abe0b119c2b30d0bf5c60304b7eb25bb61940c9025890c223b502fa
|
File details
Details for the file yetanotherschema-0.1.1-py3-none-any.whl.
File metadata
- Download URL: yetanotherschema-0.1.1-py3-none-any.whl
- Upload date:
- Size: 23.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
005aa4d1c36eb959ad9e7f093f95099840e2f84f4d5e5ee27a06fbd1843739d0
|
|
| MD5 |
27f4a44a6637b2910bffc3ed5905ba52
|
|
| BLAKE2b-256 |
408bfaee9396cb52ccdf6c353a13b344f83426b5a695b5dbb2172c5da9d6741b
|