Skip to main content

Pydantic-powered Python clients and cross-source resolver for media metadata (Servarr proxies, OpenLibrary, MusicBrainz, TMDB, TVmaze, AudioDB, Anna's Archive, Wikidata, Bandcamp, SoundCloud, YouTube, Metal Archives)

Project description

metadatarr

One library. Every catalogue. Zero API keys.

PyPI Python License: MIT Build

Pydantic-powered Python clients and a cross-source entity resolver for media metadata. Talk to the public catalogues that the *arr ecosystem, media managers, and libraries rely on — then fuse the answers into a single, de-duplicated record with a canonical set of external IDs.

TL;DR (60 seconds)

pip install metadatarr
from metadatarr.resolve import resolve
from mediavocab import Signals, MediaType

ids = resolve(Signals(title="Inception", year=2010, medium=MediaType.MOVIE)).external_ids
print(ids.tmdb_movie, ids.imdb, ids.wikidata)   # 27205 tt1375666 Q25188

That's it — no API keys, no config. resolve() fans out to every relevant catalogue, conflict-checks the answers, and hands you one merged set of external IDs. Need a different medium? Change MediaType.MOVIE to MUSIC, BOOK, PODCAST, … .

Nothing came back, or got None? See docs/troubleshooting.md — empty results are by design (silent-failure), and the troubleshooting guide explains why and how to debug it.


Why metadatarr?

Most media tools need to cross-reference the same work across Sonarr, MusicBrainz, Discogs, and Wikidata — but every API has a different shape, auth model, and concept of "the same thing." metadatarr handles all of that:

  • Typed clients — every response parsed into Pydantic V2 models; no dict spelunking.
  • Keyless by default — every built-in provider works without registration or tokens.
  • Cross-source resolver — fans out to every relevant provider in parallel, conflict-checks the results, and merges winners into one ResolveResult with ExternalIds.
  • Variant fan-out — one flag (include_variants=True) and the resolver collects every known cut, edition, or fanedit of a work.
  • Batteries-included — pyfanedit, pymetal, tutubo, py_bandcamp, and nuvem_de_som are all core dependencies; no optional-extra juggling required.

Installation

pip install metadatarr

All first-party scrapers (pyfanedit, pymetal, tutubo, py_bandcamp, nuvem_de_som) are core dependencies — no extras required. The only optional extra is [test] for running the test suite.


Direct clients

Each client is a thin, typed wrapper around one data source.

Client Source What you get
ArrMetadataClient Servarr proxies (Skyhook / Radarr / Lidarr) TV shows, movies, artists — same data that powers Sonarr/Radarr/Lidarr
OpenLibraryClient openlibrary.org Works, editions, authors, ISBN lookup, covers
BookInfoClient rreading-glasses (Goodreads / Hardcover) Book metadata via Goodreads / Hardcover
AnnasArchiveClient Anna's Archive mirrors Book search (HTML scrape)
AudioDBClient theaudiodb.com Artists, albums, tracks
TVmazeClient tvmaze.com Shows, seasons, episodes, cast, people
BlurayComClient blu-ray.com Physical Blu-ray specs — audio tracks, region codes, extras
DVDCompareClient dvdcompare.net Regional release comparison, cut runtimes, version notes
DiscogsClient discogs.com Vinyl, CD, cassette releases; search_video() for LaserDiscs / concert VHS / music DVDs
from metadatarr import ArrMetadataClient, OpenLibraryClient, AudioDBClient, TVmazeClient

# Movies & TV via Servarr proxies
arr = ArrMetadataClient()
movie  = arr.search_movie("Alien")[0]
series = arr.search_series("The Boys")[0]
artist = arr.search_artist("Moonsorrow")[0]
print(movie.tmdb_id, series.tvdb_id, artist.mb_id)

# Books
ol  = OpenLibraryClient()
hit = ol.search("The Hobbit", limit=1)[0]
print(hit.key, hit.first_publish_year)

# Music
db  = AudioDBClient()
alb = db.search_album("Voimasta ja Kunniasta")[0]
print(alb.id_album, alb.str_genre)

# TV
tv   = TVmazeClient()
show = tv.singlesearch("Severance")
print(show.id, show.network.name)

Cross-source resolver

When you have a title, a year, or a noisy filename and need a canonical identity across every platform, the resolver fans out, conflict-checks, and merges:

from metadatarr.resolve import resolve
from mediavocab import Signals, MediaType

# A basic lookup — metadatarr queries all active providers concurrently
result = resolve(Signals(title="OK Computer", artist="Radiohead", medium=MediaType.MUSIC))

print(result.external_ids.musicbrainz_release_group)  # MusicBrainz MBID
print(result.external_ids.wikidata)                   # Wikidata Q-id
print(result.external_ids.extra.get("bandcamp_album_id"))

# Inspect what was accepted and what was rejected
for m in result.accepted:
    print(f"  ✓ {m.provider:<20} confidence={m.confidence:.2f}")
for d in result.conflicts:
    fields = ", ".join(f"{c.signal}({c.ours}{c.theirs})" for c in d.fields)
    print(f"  ✗ {d.provider:<20} clashed on {fields}")

Signals — tell the resolver what you know

from mediavocab import Signals, MediaType

signals = Signals(
    title    = "Alien",
    year     = 1979,
    medium   = MediaType.MOVIE,
    runtime  = 6900,          # seconds — used for cut-disambiguation
    language = "en",
    country  = "US",
)

Pass as much or as little as you have. Every field is optional. The more context you provide, the better providers can filter and the more aggressively conflicts are detected.

MediaType values: Comes from mediavocab — 18 canonical values (MOVIE, EPISODIC_SERIES, TV, MUSIC, MUSIC_VIDEO, PODCAST, BOOK, COMIC, GAME, AUDIOBOOK, AUDIO_DRAMA, RADIO, INTERACTIVE_FICTION, SOUND_EFFECT, AMBIENT_SOUNDS, PLAYLIST, GENERIC, NOT_MEDIA). See the mediavocab spec §4.1.

Variant fan-out — editions, cuts, fanedits

from metadatarr.resolve import resolve
from mediavocab import Signals, MediaType
from metadatarr.resolve.entities import EntityRole

result = resolve(Signals(
    title           = "Alien",
    year            = 1979,
    medium          = MediaType.MOVIE,
    include_variants= True,       # ← triggers second pass
))

for entity in result.variants:
    print(entity.name, entity.external_ids.fanedit_id)
    # Alien: Covenant Cut, Alien: The Director's Cut, ...

With include_variants=True the resolver runs a second pass calling list_variants() on every active provider:

  • pyfanedit — queries fanedit.org (IFDB) for fan-edited cuts of the movie
  • musicbrainz — expands a release-group MBID to its individual releases (editions, remasters, regional pressings)

ExternalIds — every platform in one object

from mediavocab import ExternalIds

ids = result.external_ids
print(ids.tmdb_movie)                          # int
print(ids.imdb)                                # "tt0078748"
print(ids.musicbrainz_release_group)           # UUID str
print(ids.wikidata)                            # "Q103569"
print(ids.extra.get("bandcamp_album_id"))      # platform extras

First-class typed fields: musicbrainz_*, imdb, tmdb_movie, tmdb_tv, tvdb, isbn_10, isbn_13, olid, goodreads, wikidata, metal_archives_*, fanedit_id, derived_from_imdb, discogs_release, bluray_com_id, dvdcompare_id, … plus an extra dict for platform-specific IDs (Bandcamp, SoundCloud, YouTube Music, …).


Built-in providers

All providers are keyless. All dependencies are bundled in the core install.

Routing is three-axismedia, modality, and genre_filter. Pass modality on Signals to route a MediaType.GENERIC query to audio-only or video-only providers. See docs/resolve.md for details.

Provider Source MediaType Modality
skyhook Servarr proxies Movie, EpisodicSeries, Music, Book universal
musicbrainz MusicBrainz API Music AUDIO
audiodb TheAudioDB Music AUDIO
tvmaze TVmaze public API EpisodicSeries VIDEO
anilist AniList GraphQL API Movie, EpisodicSeries, Comic VIDEO + TEXT
jikan_anime Jikan (MyAnimeList) Movie, EpisodicSeries VIDEO
jikan_manga Jikan (MyAnimeList) Comic TEXT
librivox LibriVox API Audiobook AUDIO
apple_podcasts Apple Podcasts search Podcast, AudioDrama AUDIO
wikidata Wikidata API All universal
discogs Discogs API Music, MusicVideo, Generic AUDIO + VIDEO
bluray_com blu-ray.com scraper Movie VIDEO
dvdcompare dvdcompare.net scraper Movie VIDEO
pyfanedit fanedit.org / IFDB Movie (variants) VIDEO
bandcamp Bandcamp Music AUDIO
soundcloud SoundCloud Music AUDIO
youtube_music YouTube Music Music AUDIO
youtube YouTube Video, Podcast, Generic universal
metal_archives Encyclopaedia Metallum Music AUDIO
openlibrary OpenLibrary Book TEXT
annas_archive Anna's Archive Book TEXT

YouTube vs YouTube Music — these are intentionally separate providers. youtube only emits channel IDs and refuses MediaType.MUSIC lookups (video IDs aren't canonical music identities). youtube_music has proper entity records — stable browseId values for artists and albums that are safe to treat as cross-references.


Identity mappings

Some artists and labels are the same entity across platforms but no database records the link. Declare it once in a TOML file and every resolver run picks it up automatically:

# ~/.config/metadatarr/mappings.toml

[[artist]]
name                = "Acidkid / Piratech"
soundcloud_artist_url = "https://soundcloud.com/acidkid"
bandcamp_artist_url   = "https://piratech.bandcamp.com/"

[[artist]]
name               = "Moonsorrow"
musicbrainz_artist = "6a0a7b9b-9e12-4e1c-b91d-67cedf98a6c3"
bandcamp_band_id   = "3498887240"
metal_archives_band= 27

The package ships a curated metadatarr/data/mappings.toml. Your user file at ~/.config/metadatarr/mappings.toml extends it — entries that share any identifier are merged, new entries are appended. Send a PR to add publicly-verifiable cross-platform links to the package file.


Writing a custom provider

from typing import Optional
from metadatarr.resolve.base import MetadataProvider, ProviderMatch, register
from mediavocab import ExternalIds
from mediavocab import Signals, MediaType


class MyProvider(MetadataProvider):
    name  = "my_provider"
    media = {MediaType.MUSIC}

    def is_available(self) -> bool:
        return True

    def lookup(self, signals: Signals) -> Optional[ProviderMatch]:
        if not signals.title:
            return None
        result = my_api.search(signals.title)
        if not result:
            return None
        return ProviderMatch(
            provider   = self.name,
            confidence = 0.7,
            signals    = Signals(title=result["title"], medium=MediaType.MUSIC),
            external_ids = ExternalIds(
                musicbrainz_artist = result.get("mbid"),
                extra = {"my_platform_id": str(result["id"])},
            ),
        )


register(MyProvider())

Provider guidelines:

  • Guard optional imports — wrap import my_lib in try/except ImportError, set self._available = False on failure.
  • Canonical IDs only — numeric platform IDs are stable; URL slugs are not. Store URLs as *_url extra keys.
  • Refuse wrong mediums — return None if signals.medium isn't in your media set.
  • Confidence guide — 0.9 for exact-ID lookups, 0.7 for strong-signal search, 0.5–0.6 for fuzzy/unreliable sources.

Physical media

BlurayComClient and DVDCompareClient expose Blu-ray and DVD edition data that no structured API covers — region codes, audio track specs, cut runtimes, regional extras:

from metadatarr.resolve.providers.bluray_com import BlurayComProvider
from metadatarr.resolve.providers.dvdcompare import DVDCompareProvider
from mediavocab import Signals, MediaType

signals = Signals(title="Moon", year=2009, medium=MediaType.MOVIE)

bluray = BlurayComProvider()
match  = bluray.lookup(signals)
if match:
    print(match.external_ids.bluray_com_id)

dvd    = DVDCompareProvider()
match  = dvd.lookup(signals)
if match:
    print(match.external_ids.dvdcompare_id)

See docs/physical-disc.md for a full walkthrough.


Caching and concurrency

resolve() is concurrent (default 8 workers via ThreadPoolExecutor) and process-level cached:

from metadatarr.resolve._cache import cache

cache().hits    # int — cached lookups served
cache().misses  # int — network hits
cache().clear() # force re-fetch (e.g. after adding a new provider)

Both hits and misses are cached, so failed lookups don't re-hit the network on retry. Pass resolve(signals, max_workers=N) to tune parallelism.


Documentation

Doc Contents
docs/getting-started.md Install, first calls, common patterns
docs/models.md Full Pydantic model reference
docs/resolve.md Signals, providers, ResolveResult, conflict detection
docs/providers.md Provider catalogue — config, optional deps, caveats
docs/recipes.md End-to-end snippets for common tasks
docs/physical-disc.md Blu-ray / DVD edition data
docs/troubleshooting.md Gotchas and FAQ
docs/add-provider.md Checklist for adding a new resolver provider
docs/testing.md Offline-fixture / mocked-HTTP test pattern
CONTRIBUTING.md Branch/PR flow, conventional commits → versioning
docs/clients/ Per-client deep dives
examples/ One focused script per use case

Contributing

PRs welcome. Branch off dev, keep changes small, keep tests green. Commit messages use conventional commits — the version bumps automatically, so never edit version.py. To add a new source to the resolver, follow docs/add-provider.md. See CONTRIBUTING.md for the full flow.


Testing

pip install -e ".[test]"
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest -q

Tests are fully offline — all HTTP calls are stubbed with fixture files. The same command runs in CI. See docs/testing.md.


License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metadatarr-0.5.0a1.tar.gz (174.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metadatarr-0.5.0a1-py3-none-any.whl (226.9 kB view details)

Uploaded Python 3

File details

Details for the file metadatarr-0.5.0a1.tar.gz.

File metadata

  • Download URL: metadatarr-0.5.0a1.tar.gz
  • Upload date:
  • Size: 174.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for metadatarr-0.5.0a1.tar.gz
Algorithm Hash digest
SHA256 f7de93dd5136f70588053bec06209ccd429929f83b5c4883b37da796b4c20118
MD5 417f1e58fdf1d4becb7753766e7e87ae
BLAKE2b-256 40d839d8a2ff534bcce97484d93d0a43f8e9144a1cc8f45628d27aa97a35ea1b

See more details on using hashes here.

File details

Details for the file metadatarr-0.5.0a1-py3-none-any.whl.

File metadata

  • Download URL: metadatarr-0.5.0a1-py3-none-any.whl
  • Upload date:
  • Size: 226.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for metadatarr-0.5.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 57331adb3a0c9eab71eebcb4db1a4e86a25d488abcfe38a45c4123acbde3ad1c
MD5 404feaaf5d685cfc6bc749cc2094e21d
BLAKE2b-256 d435db06f41c71963da6e084b15bfd433c921caccf739ccf7d0f001b584c3654

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page