Reference vocabulary and pydantic data model for media cataloguing.
Project description
mediavocab
Reference vocabulary and pydantic data model for cataloguing media works: movies, music, books, comics, games, podcasts, audio dramas, radio, sound effects, and procedural ambient streams — all in a single shared schema.
mediavocab is a foundation library. It defines the vocabulary (enums,
genre constants) and the structural models (Work, Release, Entity, Credit,
Membership, Appearance). Application logic — provider clients, resolvers,
playback, UI — lives outside this package.
Install
pip install mediavocab
The only runtime dependency is pydantic>=2. The taxonomy/ and text/
subpackages import nothing beyond the stdlib, so they are safe to vendor in
minimal environments.
Quickstart
from mediavocab import (
Credit, CreditSection, EntityKind, EntityRef, MediaType,
RelationRole, Release, VariantKind, Work, WorkRelation, WorkRelationKind,
)
from mediavocab.text import score, work_hash
# Each cut is its own Work (spec §3.4); director's cut links via WorkRelation.
theatrical = Work(
title="Blade Runner", media_type=MediaType.MOVIE,
year=1982, runtime=117 * 60.0, production_country="US",
variant_kind=VariantKind.THEATRICAL,
credits=[Credit(
entity=EntityRef(name="Ridley Scott", kind=EntityKind.PERSON),
role="Director", relation_role=RelationRole.DIRECTOR,
section=CreditSection.PRINCIPAL,
)],
)
directors = Work(
title="Blade Runner", media_type=MediaType.MOVIE,
year=1992, runtime=116 * 60.0, production_country="US",
variant_kind=VariantKind.DIRECTORS,
relations=[WorkRelation(kind=WorkRelationKind.DERIVED_FROM, target=theatrical)],
)
# A Release manifests a Work — many formats, mirrors, packages per Work.
bluray = Release(work=theatrical, container="Blu-ray", region="US",
uri="file:///library/blade-runner.mkv")
print(work_hash(theatrical)) # stable SHA-256 identity hash
print(score(theatrical, theatrical)) # 1.0 (self-match)
More walked-through examples in examples/ covering albums,
band lineups, radio stations, IoT device routing, work comparison, the
pipeline-sentinel NOT_MEDIA / CONTROL flow, and broadcast schedules.
What's in the box
| Module | Contents |
|---|---|
mediavocab.taxonomy |
MediaType (+ PIPELINE_SENTINELS), VariantKind, ReleasePackaging, EntityKind, OrganisationKind, RelationRole, CreditSection, MembershipKind, TemporalState, ReleaseStatus, StreamMode, WorkRelationKind, ReleaseRelationKind, ContentForm, ProgrammeFormat, AccessibilityKind, PlaybackType, plus GENRE_* string constants. Zero deps. |
mediavocab.models |
Work, Release, Appearance, Chapter, AccessibilityTrack, AvailabilityWindow, LocalizedTitle, WorkRelation, ReleaseRelation, Entity, EntityRef, Membership, Credit, Programme, Schedule, ExternalIds, License, Signals. Pydantic v2. |
mediavocab.text |
Normalisation, fuzzy matching, work / release comparison and scoring, SHA-256 identity hashes (work_hash / release_hash), merge with MergeStrategy / IdentityConflict, title parser, content classifier, ISO 639 / 3166 / 8601 / ISBN helpers. Stdlib only. |
mediavocab.helpers |
Classifier predicates (is_not_media, is_device_entity, is_continuous_release), credit lookups (director, author, performers, filmography_of, episodes_of), and release ranking (quality_score, best_release). Non-normative. |
Design highlights
- A type earns its place by changing the schema (A1).
SOUND_EFFECT,PROCEDURAL_AMBIENT,AUDIO_DRAMA,MUSIC_VIDEO, etc. each catalogue against different external databases or with different runtime tolerances. - Devices are entities, not works (A3).
EntityKind.DEVICErepresents physical playback endpoints. The Work is still a RADIO/MOVIE/MUSIC; the device is how the consumer routes playback. A receiver-class device additionally has aWorkcounterpart for "turn on the radio" invocation. - Pipeline sentinels never reach a canonical Work (T8).
MediaType.GENERIC,NOT_MEDIA, andCONTROLlive on the resolver bag and are rejected atWorkconstruction. - Each cut is its own Work (§3.4). Theatrical, director's, extended,
remaster, fanedit — restructurings of the canonical artefact each get a
new Work linked by
WorkRelation.ReleasePackaging(deluxe / reissue / box-set / bootleg) is independent — that's how an edition ships. PlaybackTypeis derived fromMediaType(A6).AUDIO/VIDEO/PAGED/INTERACTIVEroutes resolver dispatch by playback intent. Never persisted on Work or Release. Declareplayback_type: ClassVar[Set[PlaybackType]]on each provider.- Genre is a free
List[str]with canonical spellings inmediavocab.taxonomy.genre. ASMR, ambient, anime, adult, etc. are genre tags applied across multiple media types — not types of their own (T1). Programme formats (documentary, concert, talk show) live inProgrammeFormat, not in genres.
See docs/ for full reference and pattern guides.
Workspace position
mediavocab sits at the bottom of the stack. Every other package in
this workspace depends on it:
mediavocab
▲
┌───────────┬─────────┼─────────┬───────────┐
│ │ │ │ │
tutubo pyfanedit pymetal pyo*… py_bandcamp / nuvem-de-som
▲ ▲ ▲ ▲
└────────┬──┴─────────┴───────────────────────┘
│
metadatarr ◄── canonical resolver, ships every provider above
▲
│
media-archivist ◄── source-DB orchestrator + sidecars + CLI/server
- mediavocab: vocabulary + structural models (this package).
- tutubo, pyfanedit, pymetal, py_bandcamp, nuvem_de_som,
radiosoma, tunein, audiobooker: API clients / scrapers. Each
emits
mediavocab.Work/Release/Entitydirectly. - metadatarr: cross-source resolver framework. Bundles every
first-party scraper as a hard runtime dep (no extras juggling) and
ships ~24 providers under
metadatarr.resolve.providers. - media-archivist: local source-DB indexer / canonicalizer / CLI / web server. Consumes metadatarr's resolver.
Testing
pip install -e ".[test]"
pytest -q
License
Apache 2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mediavocab-1.1.0.tar.gz.
File metadata
- Download URL: mediavocab-1.1.0.tar.gz
- Upload date:
- Size: 105.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b46c503a5c11068a6c96dee13eaeac446a2e8e6b70efdf7ecf4fc74c9bd0b08
|
|
| MD5 |
3bc246cf1999340f7c5c0cb3a20554e3
|
|
| BLAKE2b-256 |
6c30e45830f895f110af0637b4e13296a8b54903e14b5e74524c2e521c89b408
|
File details
Details for the file mediavocab-1.1.0-py3-none-any.whl.
File metadata
- Download URL: mediavocab-1.1.0-py3-none-any.whl
- Upload date:
- Size: 86.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35201622bdf19512ff423fdf12e1216841506e7f7a5f452fa966c9aecbb6fc28
|
|
| MD5 |
a04b27c6e9c9ad203d504b5d7c89d63f
|
|
| BLAKE2b-256 |
1610f106f4d9ff6f62e89487deba1225eb8a896599e7740ce0c7e246bda1d199
|