Skip to main content

The verifiable data layer for Korean culture & commerce, callable by any AI agent (MCP).

Project description

KoreaAPI

The verifiable data layer for Korean culture & commerce, callable by any AI agent. The MCP gateway to Korea — verifiable.

KoreaAPI exposes Korean culture, entertainment, and commerce data to AI agents via Anthropic's Model Context Protocol (MCP). Every response carries machine-readable provenance and a Skill Score so an agent can decide whether to trust and cite it.

Status: Phase 1 (cold-start). The locked spec is in SCOPE.md; what is built / decided and why is in ROADMAP.md. Live, verified, public data (Schema.org JSON-LD + /llms.txt): https://kwangdol-star.github.io/koreaapi/ Repository: kwangdol-star/koreaapi — a standalone repo, split out from its incubation home with full git history preserved.

What's live now (verified, on the public page + via MCP)

  • Cross-verification — Wikidata + Wikipedia must agree on the canonical bilingual name before a fact clears the single-source cap (high Skill Score = independent concurrence).
  • Identity guard (rejects a contradictory label) + hallucination guard (LLM-extracted data must appear verbatim in its source, else dropped — caught a fabricated chart entry live).
  • 소속사/Agency hub — each artist anchored to its label (Wikidata P264); the roster grows by discovering cross-verified labelmates (SPARQL) and is queryable via get_agency.
  • YouTube official-channel release/stats (live-state) · LLM romanization at ingest.
  • GEO/AEO — JSON-LD (incl. recordLabel) + a ready-to-cite line on every record + /llms.txt.

Why this exists

Raw Korean API wrappers are a commodity (20+ already exist on GitHub). Our moat is the combination nobody else ships:

  • Aggregation of fragmented K-culture / commerce sources
  • Verification — Skill Score + provenance, exactly where LLMs confidently hallucinate
  • Append-only time-series — a latecomer cannot reconstruct our history
  • Behavioral signal — what agents query / buy through us becomes trend data

The customer is the AI agent (consumer); humans / brands / enterprises pay.

Why now — the land-grab window

The compounding assets accrue to early, high-quality entrants: only ~13% of public MCP servers are high-trust, and AI answer engines concentrate citations on content refreshed in the last 1–3 years (Seer Interactive). A verified hub that re-verifies daily compounds a citation lead latecomers can't backfill. We are "picks-and-shovels" — the data agents consume, not a chat wrapper (a category the same market analyses find largely fails to monetize). (An independent 2026 AI-agent opportunity ranking places this exact model at its top — see ROADMAP.md.)

Revenue flywheel (engines ① + ②)

K-culture current-state is the magnet. ① commerce commission + ② trend-intelligence subscription reinforce each other: transactions generate the behavioral signal that becomes the trend product, which improves commerce conversion. See SCOPE.md §3.

The heart: append-only ingestion (component A)

fetch → LLM-extract → cross-verify → bilingual-normalize → append (+ Skill Score)

Overwrite = wrapper. Append timestamped snapshots = an asset.

Bilingual by design

Korean = canonical (provenance anchor). English = distribution layer. Names carry ko / en_official / romanized. See SCOPE.md §5.

Layout

koreaapi/
├── SCOPE.md                 # locked Phase 1 spec
├── llms.txt                 # agent-facing description
├── pyproject.toml
└── src/koreaapi/
    ├── models.py            # bilingual records + Provenance (the data contract)
    ├── skill_score.py       # transparent 0–1 quality score
    ├── pipeline/            # component A: append-only ingestion (the heart)
    │   ├── ingest.py        # fetch→extract→verify→translate→append
    │   ├── store.py         # APPEND-ONLY store (the moat)
    │   └── scheduler.py     # tiered collection cadence
    └── sources/             # source adapters (official APIs first)
        └── base.py

Dev

cd koreaapi
uv sync                      # or: pip install pydantic pytest

# run the offline end-to-end pipeline test (no API keys / network needed)
PYTHONPATH=src python -m pytest tests -q

The append-only ingestion heart (store + ingest + Skill Score + bilingual normalization) is implemented and tested offline via a MockSource. Real source adapters, all with pure fixture-tested parse steps + best-effort live fetch (graceful when egress/keys are absent):

  • Wikidata (#1) — bilingual labels via a curated entity→Q-id fast path (each anchor's identity verified, so a contradictory label is rejected, not ingested) + live wbsearchentities. Also pulls the 소속사/label (P264) and discovers labelmates (SPARQL).
  • Wikipedia (#2) — independent cross-check; when both agree on the bilingual name the Skill Score clears the single-source cap (the verification moat).
  • YouTube Data API (#3.5) — official-channel stats + latest release (live-state event data), identity-guarded; deliberately not a name cross-verifier.
  • Circle Chart (#3) — official chart, LLM-extracted with an anti-hallucination grounding guard (entries must appear verbatim in the page HTML). The page is JS-rendered, so the raw chart awaits a data endpoint; the guard ensures it ships nothing over anything false.
  • LLM romanization (Haiku) fills romanized at ingest — "cheap AI as collection labor".

Spotify is skipped (its Web API now requires Premium, 2026); a keyless EN-mostly source would only lower the cross-verified scores. See ROADMAP.md for the full log.

Egress note: the live pull needs outbound access to *.wikidata.org. In the web/sandbox environment egress is allowlist-gated — if Wikidata isn't allowlisted the live test skips (HTTP 403 host_not_allowed) while the offline parser tests still cover correctness.

Viewing & managing it (human console)

The product is agent-facing (MCP), but you (human) need a cockpit. There are two faces over one source of truth (the append-only store): the MCP server for agents, and a read-only console for you.

cd koreaapi
PYTHONPATH=src python -m koreaapi.admin seed     # populate koreaapi.db (offline sample)
PYTHONPATH=src python -m koreaapi.admin pull     # LIVE: Wikidata+Wikipedia cross-verified snapshots (+agency)
PYTHONPATH=src python -m koreaapi.admin sweep    # LIVE: discover labelmates from each anchored agency (SPARQL)
PYTHONPATH=src python -m koreaapi.admin youtube  # LIVE: official-channel release snapshots (needs YOUTUBE_API_KEY)
PYTHONPATH=src python -m koreaapi.admin chart    # LIVE: Circle Chart (LLM-extract, grounding-guarded; needs key)
PYTHONPATH=src python -m koreaapi.admin export   # write data/ asset (history + latest.json)
PYTHONPATH=src python -m koreaapi.admin signals  # top behavioral signals (engine 2: what agents query)
PYTHONPATH=src python -m koreaapi.admin stats    # data-quality summary
PYTHONPATH=src python -m koreaapi.admin dump     # print recent snapshots
PYTHONPATH=src python -m koreaapi.admin report   # -> report.html (open in a browser)

# zero-code interactive browse + query + JSON API over the same DB:
pip install datasette && datasette koreaapi.db

Automated collection (cron). .github/workflows/collect.yml runs admin pull + admin export daily (and on manual dispatch) and commits the growing data asset back to the repo: koreaapi/data/snapshots.jsonl (append-only history) + latest.json (current state, crawlable for GEO). It runs on GitHub's runners — open network, so the live pull works there even though the dev sandbox blocks Wikidata egress. Production scales this to Postgres behind the same insert-only contract (see pipeline/store.py); the repo file set is the zero-cost cold-start "database".

Public GEO page. .github/workflows/pages.yml builds report.html from live data and deploys it to GitHub Pages (one-time enable: Settings → Pages → Source: GitHub Actions) — a public, crawlable, JSON-LD-bearing URL so answer engines can surface and cite the verified data.

Watch the headline metric of a verifiable-data business: avg Skill Score, freshness, and source agreement - that is literally watching the moat.

Agent face (MCP server)

The product itself: an MCP server exposing 5 tools, each returning verified, bilingual, provenance-bearing data (with a ready-to-cite line) from the same store the console reads.

Tool Returns
get_artist_status(artist_id) latest status across kinds + verified facts + agency
get_kculture_calendar(window_days) upcoming comebacks / releases / concerts
get_agency(name) artists verified under a 소속사/label (the agency hub)
get_korea_rising(category, limit) what's rising now, ranked by observed demand + Skill Score
get_buy_options(item) where to buy (Phase 1: rail pending; logs buy-intent)
cd koreaapi
pip install fastmcp                           # use a venv if system deps clash
PYTHONPATH=src python -m koreaapi.server      # serves over MCP (stdio)

Logic lives in service.py (pure, offline-tested); server.py is the thin MCP binding. Tools register cleanly (verified in an isolated venv).

Install / connect it in your agent: see docs/MCP_INSTALL.md (run command, Claude-Desktop config, and smithery.yaml for the Smithery registry).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

koreaapi-0.1.0.tar.gz (190.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

koreaapi-0.1.0-py3-none-any.whl (53.8 kB view details)

Uploaded Python 3

File details

Details for the file koreaapi-0.1.0.tar.gz.

File metadata

  • Download URL: koreaapi-0.1.0.tar.gz
  • Upload date:
  • Size: 190.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for koreaapi-0.1.0.tar.gz
Algorithm Hash digest
SHA256 18fb045b505dcf21f908bcdeaf3115bc20fd985d6981b952140e3de001a7e45e
MD5 45c2ebd147ccfcbcacda74ea713d4163
BLAKE2b-256 7b0f2bb4db1bfe14c45c23776879fa56e4749ba0a03b37bdd568be78c9d03de0

See more details on using hashes here.

File details

Details for the file koreaapi-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: koreaapi-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 53.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for koreaapi-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f3a8d4ce1ae542dda570e86ab32134003a7b94bf7dc6974730916c6ab5126012
MD5 9c5c2de45022be5c751caadcc06aedb4
BLAKE2b-256 471534935618db16bebee2627438dbe674515cfac24a79470d3760c0ca651c71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page