Bitemporal fact storage
Project description
Kang - Bitemporal Fact Storage
Python library for bitemporal fact storage with audit trail, inspired by verter.
named after kang the conqueror and the TVA (time variance authority) from marvel's multiverse saga.
just like the TVA monitors and maintains the sacred timeline, kang tracks when facts were true in reality (business time) while keeping an audit trail of when you recorded them.he who remains… remembers everything…
table of contents
- the problem
- installation
- schema
- quick start
- core concepts
- methods
- advanced usage
- database connection
- error handling
- dependencies
- extending to other databases
the problem
tracking changing data over time is hard. your system faces several challenges:
- late-arriving data: information arrives after the fact
- corrections: you need to fix past mistakes without losing history
- audit requirements: you must prove what the data looked like at any point in time
- time travel queries: "what was the state at 2pm yesterday?"
without bitemporal tracking, you lose either history or query simplicity:
| approach | what you lose |
|---|---|
| update in place | history of corrections |
| version columns | simple queries for "state at time t" |
| event logs | easy reconstruction of current state |
example: cricket match scoring
imagine tracking a live cricket match:
- ball-by-ball details arrive late (drs reviews take 30+ seconds)
- scoring official corrects runs from 87 to 88 (wide ball was missed)
- you need to answer: "what was the score at 14:30?"
the solution:
kang tracks when facts were true (business time) while keeping an audit trail of when you recorded them (transaction time):
- ✅ query on business time: "what was the score at 14:30?" (time travel queries)
- ✅ audit with transaction time: "when did we record this score?" (compliance/debugging)
- ✅ backfill historical data without losing chronology
- ✅ correct past mistakes while preserving original values
- ✅ store granular changes (only what changed, not full snapshots)
installation
1. install package
cd kang
pip install -e .
2. initialize schema
the schema file uses :schema placeholder. replace it with your actual schema name:
# edit sql/schema.sql: change :schema to public (or your schema name)
# then execute it in your database
3. use in code
from kang import FactStore
# default: uses 'public' schema
store = FactStore(url="postgresql://user:pass@localhost/mydb")
# custom schema (must match what you initialized)
store = FactStore(url="postgresql://user:pass@localhost/mydb", schema="my_app_schema")
note: if the schema is not initialized, you'll get a
SchemaNotInitializedError
schema
kang uses two tables:
facts table transactions table
┌─────────────────────┐ ┌──────────────────────┐
│ id (uuid) │ │ id (uuid) │
│ key (text) │ │ hash (text) ───┼──┐
│ value (bytea) │ │ business_time │ │
│ hash (text) │◄─────────┼──────────────────────┘ │
└─────────────────────┘ │ at (tx_time) │ │
stores unique facts └──────────────────────┘ │
records when facts │
were true │
│
join: facts.hash = transactions.hash
why two tables?
- facts: stores each unique payload once (deduplication by hash)
- transactions: tracks when each fact was true (business_time) and when recorded (at)
- efficiency: same fact at multiple times → stored once in facts, referenced multiple times in transactions
- corrections: different facts at same business_time → multiple fact rows, each linked to same business_time
- audit trail: transaction time (
at) shows when each fact was recorded
see kang/sql/schema.sql for complete table definitions and indexes.
quick start
track a cricket match with corrections, late data, and time-travel queries:
from kang import FactStore
store = FactStore(url="postgresql://user:pass@localhost/mydb")
# for your application that maintains a match table:
# | id | team1 | team2 | date | venue |
# |--------------------------------------|-----------|-----------|------------|-------|
# | 550e8400-e29b-41d4-a716-446655440000 | india | australia | 2025-01-15 | mcg |
#
# this is how you track history in kang using kang_id (uuid with prefix):
match_id = "cricket.match.550e8400-e29b-41d4-a716-446655440000"
# 1. record initial score at 14:30
tx_id = store.add_fact(
{"kang_id": match_id, "runs": 87, "wickets": 2},
business_time="2025-01-15T14:30:00"
)
# returns: '1471b44c-f83e-11f0-8b9c-bafd80b8a6a7'
# 2. correction: wide ball was missed, runs should be 88
# creates a different fact at the same business time
tx_id = store.add_fact(
{"kang_id": match_id, "runs": 88},
business_time="2025-01-15T14:30:00"
)
# returns: '152c1422-f83e-11f0-8b9c-bafd80b8a6a7'
# 3. wicket falls at 14:35
tx_id = store.add_fact(
{"kang_id": match_id, "wickets": 3},
business_time="2025-01-15T14:35:00"
)
# returns: '15d81eac-f83e-11f0-8b9c-bafd80b8a6a7'
# 4. backfill: ball-by-ball data from 14:27 arrives late
tx_id = store.add_fact(
{"kang_id": match_id, "batsman": "Kohli", "bowler": "Starc"},
business_time="2025-01-15T14:27:30"
)
# returns: '168c01f6-f83e-11f0-8b9c-bafd80b8a6a7'
# get all facts (ordered by business time)
facts = store.get_facts(kang_id=match_id)
# returns:
# [
# {
# 'kang_id': 'cricket.match.550e8400-e29b-41d4-a716-446655440000',
# 'batsman': 'Kohli',
# 'bowler': 'Starc',
# 'at': '2025-01-15T14:27:30+00:00'
# },
# {
# 'kang_id': 'cricket.match.550e8400-e29b-41d4-a716-446655440000',
# 'runs': 87,
# 'wickets': 2,
# 'at': '2025-01-15T14:30:00+00:00'
# },
# {
# 'kang_id': 'cricket.match.550e8400-e29b-41d4-a716-446655440000',
# 'runs': 88,
# 'at': '2025-01-15T14:30:00+00:00'
# },
# {
# 'kang_id': 'cricket.match.550e8400-e29b-41d4-a716-446655440000',
# 'wickets': 3,
# 'at': '2025-01-15T14:35:00+00:00'
# }
# ]
# get facts up to 14:30
facts = store.get_facts(kang_id=match_id, upto="2025-01-15T14:30:00")
# returns: first 3 facts (up to and including 14:30)
# get current state (latest values for all attributes)
current_state = store.rollup(match_id)
# returns:
# {
# 'kang_id': 'cricket.match.550e8400-e29b-41d4-a716-446655440000',
# 'runs': 88,
# 'wickets': 3,
# 'batsman': 'Kohli',
# 'bowler': 'Starc',
# 'at': '2025-01-15T14:35:00+00:00'
# }
# time travel: what was the state at 14:30?
state_at_14_30 = store.as_of(match_id, "2025-01-15T14:30:00")
# returns:
# {
# 'kang_id': 'cricket.match.550e8400-e29b-41d4-a716-446655440000',
# 'runs': 88,
# 'wickets': 2,
# 'batsman': 'Kohli',
# 'bowler': 'Starc',
# 'at': '2025-01-15T14:30:00+00:00'
# }
core concepts
granular facts
facts in kang are granular—you only record the specific attributes that changed, not full snapshots.
# ❌ Don't do this (full snapshot)
store.add_fact(
{"kang_id": match_id, "runs": 88, "wickets": 2, "overs": 15, "team": "IND"},
business_time="2025-01-15T14:30:00"
)
# ✅ Do this (only what changed)
store.add_fact(
{"kang_id": match_id, "runs": 88}, # Only runs changed
business_time="2025-01-15T14:30:00"
)
why?
- storage efficiency: only changed values are stored
- semantic clarity: the fact clearly indicates "runs changed to 88"
- rollup works:
rollup()merges all facts, latest value wins per attribute
when you call rollup() or as_of(), kang automatically merges all granular facts to give you the complete state.
business time and transaction time
kang stores both times but queries only on business time:
- business time: when the fact was true in reality
- transaction time: when you recorded it in the database (stored for audit trail)
scenario 1: live scoring
at 14:30, score is 87/2, recorded immediately:
from kang import FactStore
store = FactStore("postgresql://user:pass@localhost/mydb")
store.add_fact(
{"kang_id": "cricket.match.550e8400-e29b-41d4-a716-446655440000", "runs": 87, "wickets": 2},
business_time="2025-01-15T14:30:00"
)
# business_time = 2025-01-15T14:30:00 (when it happened)
# tx_time = 2025-01-15T14:30:05 (when we recorded it)
scenario 2: late correction
at 14:35, realize the 14:30 score was actually 88 (wide ball missed):
store.add_fact(
{"kang_id": "cricket.match.550e8400-e29b-41d4-a716-446655440000", "runs": 88},
business_time="2025-01-15T14:30:00" # Still 14:30 (when it was true)
)
# business_time = 2025-01-15T14:30:00 (when it was true)
# tx_time = 2025-01-15T14:35:12 (when we corrected it)
scenario 3: backfilling
at 15:00, add ball-by-ball data from 14:27 that arrived late:
store.add_fact(
{"kang_id": "cricket.match.550e8400-e29b-41d4-a716-446655440000", "batsman": "Kohli", "bowler": "Starc"},
business_time="2025-01-15T14:27:30" # Historical time
)
# business_time = 2025-01-15T14:27:30 (when the ball was bowled)
# tx_time = 2025-01-15T15:00:00 (when we received the data)
note: business time can be in the past (backfilling) or present (live updates). transaction time is always now().
the at field
each fact returned by kang includes an at field showing its business time:
facts = store.get_facts(kang_id="cricket.match.550e8400-e29b-41d4-a716-446655440000")
# [
# {'kang_id': '...', 'runs': 87, 'wickets': 2, 'at': '2025-01-15T14:30:00'},
# {'kang_id': '...', 'runs': 88, 'at': '2025-01-15T14:30:00'},
# ...
# ]
- the
atfield always equals business_time - when you don't provide
business_time, kang uses the current time (transaction time) as the business time - so
attells you "when this fact was true in reality"
viewing transaction metadata
to see when facts were recorded (audit trail), use with_tx=True:
facts = store.get_facts(kang_id="cricket.match.550e8400-e29b-41d4-a716-446655440000", with_tx=True)
# [
# {
# 'kang_id': '...',
# 'runs': 87,
# 'wickets': 2,
# 'at': '2025-01-15T14:30:00', # When it was true
# 'tx_time': '2025-01-15T14:30:05', # When we recorded it
# 'tx_id': '5f62d4c0-...' # Transaction ID
# },
# {
# 'kang_id': '...',
# 'runs': 88,
# 'at': '2025-01-15T14:30:00', # When it was true
# 'tx_time': '2025-01-15T14:35:12', # When we corrected it
# 'tx_id': '8a93f5d1-...'
# }
# ]
deduplication behavior
kang prevents storing identical facts at the same business time:
# First time: stores the fact
store.add_fact(
{"kang_id": "cricket.match.550e8400-e29b-41d4-a716-446655440000", "runs": 87},
business_time="2025-01-15T14:30:00"
)
# Returns: [<transaction_id>]
# Second time: exact same fact at same business time
result = store.add_fact(
{"kang_id": "cricket.match.550e8400-e29b-41d4-a716-446655440000", "runs": 87},
business_time="2025-01-15T14:30:00"
)
# Returns: {"noop": "fact already exists at this business time"}
different facts at the same business time are allowed:
# Different fact (runs=88 instead of runs=87) at same business time
store.add_fact(
{"kang_id": "cricket.match.550e8400-e29b-41d4-a716-446655440000", "runs": 88},
business_time="2025-01-15T14:30:00"
)
# Returns: [<transaction_id>] - stored successfully
note: this is the correction scenario—the original score of 87 and corrected score of 88 both exist at 14:30.
methods
add_fact(fact, business_time=None)
records a single fact.
parameters:
fact(dict): Must includekang_idfield for identity trackingbusiness_time(str, optional): ISO 8601 timestamp. Defaults to current time if not provided
returns: transaction uuid or {"noop": "message"} if fact already exists at that business time
example:
# record live data
tx_id = store.add_fact(
{"kang_id": "cricket.match.550e8400-e29b-41d4-a716-446655440000", "runs": 92},
business_time="2025-01-15T15:00:00"
)
# returns: '1a2b3c4d-f83e-11f0-8b9c-bafd80b8a6a7'
# record without business_time (uses current time)
tx_id = store.add_fact(
{"kang_id": "cricket.match.550e8400-e29b-41d4-a716-446655440000", "wickets": 4}
)
# returns: '2b3c4d5e-f83e-11f0-8b9c-bafd80b8a6a7'
# duplicate fact returns noop
result = store.add_fact(
{"kang_id": "cricket.match.550e8400-e29b-41d4-a716-446655440000", "runs": 92},
business_time="2025-01-15T15:00:00"
)
# returns: {'noop': 'no changes to identities were detected...'}
add_facts(facts, business_time=None)
records multiple facts in a single transaction.
parameters:
facts(list[dict]): List of facts, each must includekang_idbusiness_time(str, optional): ISO 8601 timestamp applied to all facts
returns: list of transaction uuids or {"noop": "message"}
example:
# record multiple match updates together
tx_ids = store.add_facts([
{"kang_id": "cricket.match.550e8400-e29b-41d4-a716-446655440000", "runs": 92, "wickets": 3},
{"kang_id": "cricket.match.550e8400-e29b-41d4-a716-446655440000", "run_rate": 6.5}
], business_time="2025-01-15T15:00:00")
# returns: ['3c4d5e6f-f83e-11f0-8b9c-bafd80b8a6a7', '4d5e6f7a-f83e-11f0-8b9c-bafd80b8a6a7']
get_facts(kang_id, upto=None, with_tx=False)
retrieves facts for a specific identity.
parameters:
kang_id(str): Identity to retrieve facts forupto(str, optional): ISO 8601 timestamp. Only returns facts withbusiness_time <= uptowith_tx(bool): Include transaction metadata (tx_time,tx_id)
returns: list of facts ordered by business time
example:
# get all facts
facts = store.get_facts(kang_id="cricket.match.550e8400-e29b-41d4-a716-446655440000")
# returns:
# [
# {'kang_id': '...', 'batsman': 'Kohli', 'bowler': 'Starc', 'at': '2025-01-15T14:27:30+00:00'},
# {'kang_id': '...', 'runs': 87, 'wickets': 2, 'at': '2025-01-15T14:30:00+00:00'},
# {'kang_id': '...', 'runs': 88, 'at': '2025-01-15T14:30:00+00:00'},
# {'kang_id': '...', 'wickets': 3, 'at': '2025-01-15T14:35:00+00:00'}
# ]
# get facts up to specific time
facts = store.get_facts(
kang_id="cricket.match.550e8400-e29b-41d4-a716-446655440000",
upto="2025-01-15T14:30:00"
)
# returns: first 3 facts (business_time <= 14:30)
# include audit metadata
facts = store.get_facts(
kang_id="cricket.match.550e8400-e29b-41d4-a716-446655440000",
with_tx=True
)
# returns: facts with 'tx_time' and 'tx_id' fields
# [
# {
# 'kang_id': '...',
# 'batsman': 'Kohli',
# 'bowler': 'Starc',
# 'at': '2025-01-15T14:27:30+00:00',
# 'tx_time': '2025-01-20T14:59:56.123456+00:00',
# 'tx_id': '168c01f6-f83e-11f0-8b9c-bafd80b8a6a7'
# },
# ...
# ]
rollup(kang_id, with_nils=False)
computes the current state by merging all facts. latest value wins for each attribute.
parameters:
kang_id(str): Identity to rollupwith_nils(bool): Include attributes set toNone
returns: dictionary with latest values for all attributes
example:
# get current match state
current = store.rollup("cricket.match.550e8400-e29b-41d4-a716-446655440000")
# returns:
# {
# 'kang_id': 'cricket.match.550e8400-e29b-41d4-a716-446655440000',
# 'runs': 88,
# 'wickets': 3,
# 'batsman': 'Kohli',
# 'bowler': 'Starc',
# 'at': '2025-01-15T14:35:00+00:00'
# }
# with None values
store.add_fact({"kang_id": "cricket.match.550e8400-e29b-41d4-a716-446655440000", "rain_delay": None})
current = store.rollup("cricket.match.550e8400-e29b-41d4-a716-446655440000")
# returns: {'kang_id': '...', 'runs': 88, 'wickets': 3, ...} # rain_delay excluded
current = store.rollup("cricket.match.550e8400-e29b-41d4-a716-446655440000", with_nils=True)
# returns: {'kang_id': '...', 'runs': 88, 'wickets': 3, 'rain_delay': None, ...}
as_of(kang_id, time, with_nils=False)
time-travel query: reconstructs state at a specific business time.
parameters:
kang_id(str): Identity to querytime(str): ISO 8601 timestampwith_nils(bool): Include attributes set toNone
returns: dictionary with state at that time
example:
# what was the match state at 14:30 on jan 15?
state = store.as_of(
"cricket.match.550e8400-e29b-41d4-a716-446655440000",
"2025-01-15T14:30:00"
)
# returns:
# {
# 'kang_id': 'cricket.match.550e8400-e29b-41d4-a716-446655440000',
# 'runs': 88,
# 'wickets': 2,
# 'batsman': 'Kohli',
# 'bowler': 'Starc',
# 'at': '2025-01-15T14:30:00+00:00'
# }
# (note: wickets=2, not 3, because the wicket fell at 14:35)
advanced usage
transaction metadata (with_tx)
by default, get_facts() only returns business data. use with_tx=True to see the audit trail:
# default: business data only
facts = store.get_facts(kang_id="cricket.match.550e8400-e29b-41d4-a716-446655440000")
# returns:
# [
# {'kang_id': '...', 'runs': 87, 'wickets': 2, 'at': '2025-01-15T14:30:00+00:00'}
# ]
# with audit trail
facts = store.get_facts(kang_id="cricket.match.550e8400-e29b-41d4-a716-446655440000", with_tx=True)
# returns:
# [
# {
# 'kang_id': '...',
# 'runs': 87,
# 'wickets': 2,
# 'at': '2025-01-15T14:30:00+00:00', # when it was true (business time)
# 'tx_time': '2025-01-20T14:59:56+00:00', # when we recorded it (transaction time)
# 'tx_id': '1471b44c-f83e-11f0-8b9c-bafd80b8a6a7'
# }
# ]
use cases:
- audit compliance: "when did we record this score?"
- correction tracking: "why are there multiple facts at 14:30?" → check tx_time to see one was recorded at 14:30:05 and the correction at 14:35:12
- late data detection: compare
atvstx_timeto find backfilled data
nil values (with_nils)
control whether None values appear in rollup/as_of results:
match_id = "cricket.match.550e8400-e29b-41d4-a716-446655440000"
# record some facts
store.add_fact({"kang_id": match_id, "runs": 87})
store.add_fact({"kang_id": match_id, "rain_delay": "20 minutes"})
# later: rain delay ends, remove the attribute
store.add_fact({"kang_id": match_id, "rain_delay": None})
# default: excludes None values
current = store.rollup(match_id)
# returns: {'kang_id': '...', 'runs': 87, 'at': '...'} # no rain_delay
# include None values
current = store.rollup(match_id, with_nils=True)
# returns: {'kang_id': '...', 'runs': 87, 'rain_delay': None, 'at': '...'}
use with_nils=True to:
- distinguish "never existed" from "was explicitly deleted"
- preserve schema awareness (all possible fields visible)
- debug missing data issues
time range filtering (upto)
query facts within a business time range:
match_id = "cricket.match.550e8400-e29b-41d4-a716-446655440000"
# get facts up to a specific time
facts = store.get_facts(kang_id=match_id, upto="2025-01-15T14:30:00")
# returns: facts where business_time <= 14:30
# [
# {'kang_id': '...', 'batsman': 'Kohli', 'bowler': 'Starc', 'at': '2025-01-15T14:27:30+00:00'},
# {'kang_id': '...', 'runs': 87, 'wickets': 2, 'at': '2025-01-15T14:30:00+00:00'},
# {'kang_id': '...', 'runs': 88, 'at': '2025-01-15T14:30:00+00:00'}
# ]
# (excludes wicket update at 14:35)
# combine with as_of for time-travel
state = store.as_of(match_id, "2025-01-15T14:30:00")
# internally uses get_facts(upto="2025-01-15T14:30:00") then merges
use cases:
- historical snapshots: "what facts existed at end of day?"
- incremental processing: "give me facts since last sync"
- debugging: "which facts contributed to this state?"
database connection
FactStore accepts either a database URL or a connection pool:
- url: postgresql connection string
- pool: psycopg2 ThreadedConnectionPool instance
- schema: database schema name (default: "public")
error handling
from kang import FactStore, SchemaNotInitializedError
try:
store = FactStore("postgresql://user:pass@localhost/mydb")
except SchemaNotInitializedError:
print("schema not found. edit sql/schema.sql and run: psql -d mydb -f sql/schema.sql")
# validation error
try:
store.add_fact({"runs": 87}) # missing kang_id
except ValueError as e:
print(e) # "fact must contain a 'kang_id'"
dependencies
- psycopg2-binary >= 2.9
extending to other databases
to add support for sqlite, mysql, etc.:
- create database-specific schema: adapt
sql/schema.sql - update
_verify_schema(): change table existence check - update
add_facts(): adjust conflict handling for upsert behavior
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kang-0.1.4.tar.gz.
File metadata
- Download URL: kang-0.1.4.tar.gz
- Upload date:
- Size: 19.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
232a330b674f0f2e7ed4a68a1841fd2192ec509202db229527fecf7641a5789d
|
|
| MD5 |
ea523389a0df6f7a5162f88a745a6f0a
|
|
| BLAKE2b-256 |
7d115d107e28c092c55c25cd09cc53341997ffec5d1ad30d639d557b90a30baf
|
File details
Details for the file kang-0.1.4-py3-none-any.whl.
File metadata
- Download URL: kang-0.1.4-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1bb4f78ec798b8b521faa3634fbb7ce5ddba461e4ea32443bc53d4fa06aed9b
|
|
| MD5 |
3683277eef5e036e02264b7c5dd79b13
|
|
| BLAKE2b-256 |
e39bc0d2f32175504d3521b326672ff30e20f3e90ab6e51bba9cc664b38b7eaa
|