Skip to main content

Interval annotation system

Project description

lacing

A standoff, interval-keyed annotation system. Pythonic core: a MutableMapping[TimeInterval, list[Annotation]] facade with rational time, ELAN-style tier stereotypes, and Allen's interval algebra. Designed for time-based media (audio, video, speech, music) but generalizes to any 1-D interval domain.

Status: Phase 0–2 complete. Core data model, in-memory + SQLite + Postgres stores, eight round-trip adapters (Praat TextGrid, WebVTT, W3C Web Annotation, .annot SQLite, ELAN EAF, JAMS, Label Studio JSON, OpenTimelineIO), body-schema registry + JSON Schema export + migrations, inter-annotator agreement metrics, a lacing CLI, a FastAPI HTTP server (REST CRUD + ETag + import/export + schemas + op-log + /state-at time-travel), an MCP server (10 tools, agents as first-class clients), a processor registry (low_confidence_review, detect_density_change_points) with optional Arq integration, and opt-in OpenTelemetry instrumentation. Frontend is on the roadmap (see misc/docs/Lacing Development Roadmap.md).

Install

pip install lacing                # core only
pip install 'lacing[textgrid]'    # + Praat TextGrid support (praatio)
pip install 'lacing[eaf]'         # + ELAN EAF support (pympi-ling)
pip install 'lacing[jams]'        # + JAMS (MIR annotation) support
pip install 'lacing[postgres]'    # + PostgresStore (psycopg + GiST + EXCLUDE)
pip install 'lacing[server]'      # + FastAPI HTTP server
pip install 'lacing[mcp]'         # + MCP server (agents as first-class clients)
pip install 'lacing[arq]'         # + Arq background workers (Redis-backed)
pip install 'lacing[otio]'        # + OpenTimelineIO adapter
pip install 'lacing[otel]'        # + OpenTelemetry instrumentation

30-second tour

from lacing.adapters import textgrid, webvtt, web_annotation  # registers each
from lacing.adapters import load, dump

# Load a Praat TextGrid → an in-memory store keyed by interval
store = load("speech.TextGrid", rate=1000)

# Query overlaps using Allen's relations
from lacing.time import RationalTime, TimeInterval
window = TimeInterval(RationalTime(500, 1000), RationalTime(1500, 1000))

for ann in store.intersects(window):
    print(ann.tier, ann.body["text"])

for ann in store.during(window):  # strictly inside the window
    ...

# Save out as WebVTT
dump(store, "speech.vtt", format="webvtt")

# Or as W3C Web Annotation JSON-LD
dump(store, "speech.jsonld", format="web_annotation")

Track facades — opinionated bundles of tiers

Some tier shapes recur often enough to deserve a friendly builder. lacing.tracks.subtitle is the first: a (sections, lines, words) trio over one audio asset, with float-second times and the Annotation / MediaRef / Provenance plumbing hidden:

from lacing import MemoryStore
from lacing.tracks.subtitle import SubtitleBuilder, SubtitleTrack

store = MemoryStore()
with SubtitleBuilder(store, asset_id="song/audio.mp3") as b:
    b.section("intro",   0.0, 12.5)
    b.section("verse_1", 12.5, 35.0)
    b.line(
        "I came down to the river", 12.5, 16.2,
        section="verse_1", line_index=0,
        words=[
            ("I",     12.5, 12.7),
            ("came",  12.7, 13.0, 0.95),  # optional confidence
            ("down",  13.0, 13.3),
        ],
    )

track = SubtitleTrack(store, asset_id="song/audio.mp3")
track.lines_in(15.0, 17.0)        # lines overlapping the window
track.words_in(12.5, 13.5)        # words overlapping the window
track.sections_covering(20.0)     # sections containing this instant

The facade reuses at_tier / by_tier under the hood; anything you can build with it could also be hand-built with the raw API. Body schema URIs are conventional (annot://schema/song-section/v1, lyric-line/v1, word/v1) — pass register_subtitle_schemas() once if you want Pydantic body validation.

What's in the core

lacing/
├── time.py          RationalTime + TimeInterval — rational, half-open, never float
├── model.py         Annotation envelope + Reference union + Provenance (PROV-O subset)
├── tier.py          Tier + 5 ELAN tier stereotypes + constraint validator
├── allen.py         13 Allen relations + intersects + relate + composition
├── store/
│   ├── base.py      IntervalAnnotationStore (MutableMapping facade)
│   ├── memory.py    MemoryStore over `intervaltree`
│   ├── sqlite.py    SqliteStore — persistent backend + .annot file format
│   └── postgres.py  PostgresStore — int8range + GiST + per-tier EXCLUDE
├── adapters/
│   ├── textgrid.py        Praat .TextGrid (interval + point tiers)
│   ├── webvtt.py          .vtt subtitles/captions
│   ├── web_annotation.py  W3C Web Annotation Data Model (JSON-LD)
│   ├── annot.py           .annot SQLite portable file format (lossless)
│   ├── eaf.py             ELAN EAF (4 stereotypes verbatim)
│   └── jams.py            JAMS (Music Information Retrieval) — namespaces → tiers
├── cli.py           `lacing` CLI: convert, query, validate, list-formats
├── quality.py       Cohen's κ, Krippendorff's α, interval IoU, boundary IoU
├── schema.py        Body schema registry + JSON Schema export + migrations
├── bodies/          Built-in body schemas (word, named-entity, ...)
└── server/          FastAPI HTTP server (Phase 2)
    ├── app.py           create_app(); ready-to-run `app` for uvicorn
    ├── deps.py          dependency-injection (store factory)
    ├── etag.py          ETag computation + If-Match parsing
    └── routers/         REST endpoints: annotations, tiers, adapters, meta

Design rules in one breath

  1. Time is rationalRationalTime(value: int, rate: int). Wire format {v, r}. Never floats.
  2. Standoff — annotations reference media by (asset_id, interval); source is immutable.
  3. One envelope, typed bodyAnnotation.body: dict validated by body_schema_uri (semver).
  4. Allen's algebra is the public predicate API — never write ad-hoc overlap checks.
  5. ELAN tier stereotypes verbatimNONE, TIME_SUBDIVISION, INCLUDED_IN, SYMBOLIC_SUBDIVISION, SYMBOLIC_ASSOCIATION.
  6. PROV-O provenance inline on every annotationwas_generated_by, was_attributed_to, was_derived_from, generated_at_time.
  7. MIT/BSD/Apache licenses only.

The full reasoning lives in misc/docs/ — four design docs covering annotation systems generally, backend architecture, frontend UI, and an OSS deep-dive of what to build on. The synthesized plan is in misc/docs/Lacing Development Roadmap.md.

Concrete recipes

Build annotations programmatically

from uuid import uuid4
from lacing import (
    Annotation, MediaRef, MemoryStore, Provenance,
    RationalTime, TimeInterval, Tier,
)

store = MemoryStore()
store.add_tier(Tier("words"))

store.add(Annotation(
    id=uuid4(),
    tier="words",
    reference=MediaRef(
        asset_id="blake3:abc123",
        interval=TimeInterval.from_seconds("0.0", "0.5", rate=1000),
    ),
    body={"text": "hello"},
    body_schema_uri="annot://schema/word/v1",
    provenance=Provenance(
        was_generated_by="user:thor",
        was_attributed_to="thor",
        generated_at_time=RationalTime.zero(1000),
    ),
))

Query with Allen's relations

from lacing.allen import AllenRelation
from lacing.time import RationalTime, TimeInterval

w = TimeInterval(RationalTime(0, 1000), RationalTime(500, 1000))

list(store.intersects(w))                       # any overlap
list(store.during(w))                           # strictly inside w
list(store.contains(w))                         # strictly contains w
list(store.relate(w, [AllenRelation.MEETS]))   # ends at w.start

Persist annotations

from lacing.store import SqliteStore

# Open or create a .annot file (SQLite under the hood)
store = SqliteStore("project.annot")
store.add_tier(...)
store.add(...)            # writes go straight to disk
store.set_meta("project", "demo")

# Same MutableMapping + Allen-relation interface as MemoryStore
for ann in store.intersects(window):
    ...
store.close()

The .annot file is the recommended portable handoff format — single-file SQLite, Git-trackable, lossless round-trip with MemoryStore.

For multi-user / production scale, the same facade is available over PostgreSQL:

from lacing.store import PostgresStore
from lacing.tier import Tier

store = PostgresStore("postgresql://localhost/myproject", rate=1000)

# Per-tier non-overlap is enforced declaratively by the database — try to
# add an overlapping annotation in this tier and Postgres rejects the insert.
store.add_tier(Tier("speakers"), enforce_no_overlap=True)

The Postgres backend uses int8range + GiST (sub-millisecond overlap queries at million-row scale) and exposes the same Allen-relation methods. Times are normalized to a project-wide rate stored in meta.

CLI

After pip install -e . the lacing command is on your PATH:

lacing list-formats                                          # show registered adapters
lacing convert speech.TextGrid speech.annot                  # convert between formats
lacing query speech.annot --start 1.0 --end 5.0 --rate 1000  # JSON-lines
lacing validate speech.annot                                 # parse + summary

Body schemas, validation, migrations

Every annotation has a body: dict validated against the schema named by its body_schema_uri (e.g., annot://schema/named-entity/v2). Register your own with a Pydantic v2 model:

from pydantic import BaseModel, Field
from lacing.schema import register_body_schema, register_migration, validate, migrate

class WordBodyV1(BaseModel):
    model_config = {"frozen": True, "extra": "forbid"}
    text: str = Field(...)
    speaker: str | None = None

register_body_schema("annot://schema/word/v1", WordBodyV1)

# Validate at runtime:
validate({"text": "hello"}, "annot://schema/word/v1")

# Register a forward migration v1 -> v2:
@register_migration(schema_name="word", from_version=1, to_version=2)
def _v1_to_v2(body: dict) -> dict:
    return {**body, "lemma": None}

# Migrate stored data:
migrated = migrate({"text": "ran"},
                   from_uri="annot://schema/word/v1",
                   to_uri="annot://schema/word/v2")

Export every registered schema to JSON Schema (the upstream for downstream Zod codegen):

from lacing.schema import export_json_schemas
export_json_schemas("./schema/")  # writes <name>/v<N>.json + index.json

Built-in body schemas live under lacing/bodies/ (word, named-entity). They register themselves on import.

Run the HTTP server

pip install 'lacing[server]'
uvicorn lacing.server:app --reload

By default the server starts with an in-memory SqliteStore. Wire your own backend (e.g., a PostgresStore or an .annot file) via FastAPI's dependency-override:

from lacing.server import create_app
from lacing.server.deps import get_store
from lacing.store import SqliteStore

store = SqliteStore("project.annot", check_same_thread=False)
app = create_app()
app.dependency_overrides[get_store] = lambda: store

The REST surface (Phase 2.0):

GET    /health
GET    /tiers                              list
POST   /tiers                              create or update
GET    /tiers/{name}                       get one
POST   /annotations                        create (returns ETag)
GET    /annotations                        list with optional ?tier&start&end&relation&rate
GET    /annotations/{id}                   get one (returns ETag)
PATCH  /annotations/{id}                   partial update; If-Match required
DELETE /annotations/{id}
POST   /import?format=webvtt               upload a file in any registered format
GET    /export?format=eaf                  dump store as a file
GET    /formats                            list registered adapters
GET    /schemas                            list registered body_schema_uris
GET    /schemas/{uri}                      JSON Schema for a URI
GET    /meta, PUT /meta/{key}              key/value metadata
GET    /oplog                              list mutations (filterable by clock)
GET    /oplog/latest-clock                 current Lamport clock value
GET    /state-at?clock=N                   replay log to clock N → snapshot

Every mutation gets a Lamport clock returned in the X-Lacing-Clock response header. The op-log + /state-at endpoint give you full time-travel debug — pick any past clock value and reconstruct exactly what the system saw.

MCP server — agents as first-class clients

from lacing.oplog import InMemoryOpLog
from lacing.server.mcp import build_mcp_server
from lacing.store import SqliteStore

store = SqliteStore("project.annot", check_same_thread=False)
oplog = InMemoryOpLog()
server = build_mcp_server(store, oplog)
server.run()  # stdio transport by default

Tools registered (all take seconds — no need to construct rational-time wire dicts): add_annotation, query_annotations, get_annotation, delete_annotation, accept_ai_suggestion, add_tier, list_tiers, list_formats, latest_clock, state_at. The MCP server shares the same store + oplog as the FastAPI app, so a human edit via REST and an agent edit via MCP land in the same op-log with the same Lamport clock.

Inter-annotator agreement

from lacing.quality import cohen_kappa, krippendorff_alpha, boundary_iou

# Two annotators on a categorical task
kappa = cohen_kappa(["A", "B", "A", "B"], ["A", "A", "A", "B"])

# Three annotators with missing data
alpha = krippendorff_alpha([
    ["A", "B", None, "C"],
    ["A", "B", "B",  "C"],
    ["A", "A", "B",  "C"],
])

# Compare two segmentations
score = boundary_iou(
    [a.interval for a in store_a.by_tier("speakers")],
    [a.interval for a in store_b.by_tier("speakers")],
)

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lacing-0.0.12.tar.gz (238.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lacing-0.0.12-py3-none-any.whl (119.3 kB view details)

Uploaded Python 3

File details

Details for the file lacing-0.0.12.tar.gz.

File metadata

  • Download URL: lacing-0.0.12.tar.gz
  • Upload date:
  • Size: 238.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lacing-0.0.12.tar.gz
Algorithm Hash digest
SHA256 2c468b3c3339f3e811759a3e9fd66b0f712d5354563624683b2a7c4ead13c11f
MD5 9d537ee4755b43f1d0696c28384d42b2
BLAKE2b-256 a460e10e434e3f2dfb04a8e491974685f420865dc10f0b984e61443b8acfe387

See more details on using hashes here.

File details

Details for the file lacing-0.0.12-py3-none-any.whl.

File metadata

  • Download URL: lacing-0.0.12-py3-none-any.whl
  • Upload date:
  • Size: 119.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lacing-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 fc9dcf95665914ca64e54888260646cc10b6095a00e9854fbd6fdf56bf7a0502
MD5 9c1b4aa708134de2cb5e523b69bbde73
BLAKE2b-256 9319825b09c82c5d7f492d016f82cd465a9b0961e006d9b8a50e4fa6f8a00a57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page