Skip to main content

Reference vocabulary and pydantic data model for media cataloguing.

Project description

mediavocab

Reference vocabulary and pydantic data model for cataloguing media works: movies, music, books, comics, games, podcasts, audio dramas, radio, sound effects, and ambient soundscapes — all in a single shared schema.

mediavocab is a foundation library. It defines the vocabulary (enums, genre constants) and the structural models (Work, Release, Entity, Credit, Membership, Appearance). Application logic — provider clients, resolvers, playback, UI — lives outside this package.

Install

pip install mediavocab

The only runtime dependency is pydantic>=2. The taxonomy/ and text/ subpackages import nothing beyond the stdlib, so they are safe to vendor in minimal environments.

Quickstart

from mediavocab import MediaType, Work, Release, VariantKind
from mediavocab.helpers import make_movie, make_release
from mediavocab.text import score, work_hash

work = make_movie("Blade Runner", year=1982, runtime=117 * 60.0,
                  director="Ridley Scott")
theatrical = make_release(work, "file:///library/blade-runner/theatrical.mkv")
directors  = make_release(work, "file:///library/blade-runner/directors.mkv",
                          variant_kind=VariantKind.DIRECTORS)

print(work_hash(work))                           # stable identity hash
print(score(work, work))                         # 1.0 (self-match)
print(work.model_dump_json())                    # pydantic JSON

More walked-through examples in examples/ covering albums, band lineups, radio stations, IoT device routing, work comparison, and the NOT_MEDIA classifier sentinel.

What's in the box

Module Contents
mediavocab.taxonomy MediaType, VariantKind, EntityKind, RelationRole, CreditSection, MembershipStatus, ReleaseStatus, StreamMode, WorkRelationKind, PlaybackModality, plus GENRE_* string constants. Zero deps.
mediavocab.models Work, Release, Appearance, WorkRelation, ReleaseRelation, Entity, EntityRef, Membership, Credit, Programme, Schedule, License. Pydantic v2.
mediavocab.text Normalisation, fuzzy matching, work comparison/scoring, ISO 639/3166 helpers. Stdlib only.
mediavocab.helpers Convenience builders and classifier predicates. Non-normative.

Design highlights

  • A type earns its place by changing the schema. SOUND_EFFECT, AMBIENT_SOUNDS, AUDIO_DRAMA, MUSIC_VIDEO, etc. each catalogue against different external databases or with different runtime tolerances.
  • Devices are entities, not works. EntityKind.DEVICE represents physical playback endpoints (smart speakers, smart plugs, cast targets). The Work is still a RADIO/MOVIE/MUSIC; the device is how the consumer routes playback.
  • NOT_MEDIA is a terminal sentinel for the classifier — distinct from GENERIC, which is a transient "type unknown, may resolve" state.
  • Work is canonical, Release is the manifestation. A director's cut is a different Release of the same Work. A bootleg is a different Release of the same Work. The Work's identity hash never depends on Release metadata.
  • PlaybackModality is orthogonal to MediaType. AUDIO / VIDEO / TEXT / INTERACTIVE routes resolver dispatch by playback intent. A Signals(modality=AUDIO) query never touches video-only providers, even if medium=GENERIC. Declare modality: ClassVar[Set[PlaybackModality]] on each provider; empty means universal.
  • Genre is a free List[str] with canonical spellings in mediavocab.taxonomy.genre. ASMR, ambient, anime, adult, etc. are genre tags applied across multiple media types — not types of their own.

See docs/ for full reference and pattern guides.

Workspace position

mediavocab sits at the bottom of the stack. Every other package in this workspace depends on it:

                          mediavocab
                              ▲
        ┌───────────┬─────────┼─────────┬───────────┐
        │           │         │         │           │
      tutubo   pyfanedit   pymetal   pyo*…       py_bandcamp / nuvem-de-som
        ▲           ▲         ▲                       ▲
        └────────┬──┴─────────┴───────────────────────┘
                 │
              metadatarr  ◄── canonical resolver, ships every provider above
                 ▲
                 │
           media-archivist  ◄── source-DB orchestrator + sidecars + CLI/server
  • mediavocab: vocabulary + structural models (this package).
  • tutubo, pyfanedit, pymetal, py_bandcamp, nuvem_de_som, radiosoma, tunein, audiobooker: API clients / scrapers. Each emits mediavocab.Work / Release / Entity directly.
  • metadatarr: cross-source resolver framework. Bundles every first-party scraper as a hard runtime dep (no extras juggling) and ships ~24 providers under metadatarr.resolve.providers.
  • media-archivist: local source-DB indexer / canonicalizer / CLI / web server. Consumes metadatarr's resolver.

Testing

pip install -e ".[test]"
pytest -q

License

Apache 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mediavocab-0.1.1a1.tar.gz (87.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mediavocab-0.1.1a1-py3-none-any.whl (182.4 kB view details)

Uploaded Python 3

File details

Details for the file mediavocab-0.1.1a1.tar.gz.

File metadata

  • Download URL: mediavocab-0.1.1a1.tar.gz
  • Upload date:
  • Size: 87.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mediavocab-0.1.1a1.tar.gz
Algorithm Hash digest
SHA256 2086f024927ce9821ba257aab4bbf6914a2d3824b6049e95e8e64ee58f28ba34
MD5 728844986a26ae126199b405275509f3
BLAKE2b-256 9785dc1c86f348d8c8021a29cc109643bf2a800daec320cf6c39d33d5288aa3e

See more details on using hashes here.

File details

Details for the file mediavocab-0.1.1a1-py3-none-any.whl.

File metadata

  • Download URL: mediavocab-0.1.1a1-py3-none-any.whl
  • Upload date:
  • Size: 182.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mediavocab-0.1.1a1-py3-none-any.whl
Algorithm Hash digest
SHA256 61bfe7ec08f4e837cb91a574de1ffbb6584eb337e64a5396e5385004a49f66e7
MD5 00f58fa2a908b5df3994deb98a7ae067
BLAKE2b-256 d8979e27eb0f3389b6587b9413ba8147e343c15ea76eea1516e26eb1936eff40

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page