Skip to main content

MCP server for searching and downloading files from the Internet Archive (archive.org)

Project description

mcarchive-org

An MCP (Model Context Protocol) server that lets an LLM search, inspect, and download content from the Internet Archive.

Built on FastMCP + httpx. No API key required — archive.org's read endpoints are public.

Tools

Tool Purpose
search_items Small Solr-style search via advancedsearch.php (1–200 rows, paginated)
scrape_items Bulk cursor-paginated search via Scrape API (count ≥ 100)
get_item_metadata Metadata for one item; skips the (possibly huge) files list by default
list_files Files array with optional format / glob filtering — includes download_url per file
get_file_url Build a canonical download URL without hitting the network
download_file Stream a file to disk with resume support and optional MD5 verification

Also exposes an MCP resource template: archive://item/{identifier}.

Install & run

# From a checkout:
uv sync
uv run mcarchive-org

# Or from PyPI (once published):
uvx mcarchive-org

Register with Claude Code:

claude mcp add archive-org -- uvx mcarchive-org
# or, from a local checkout:
claude mcp add archive-org -- uv run --directory /path/to/mcarchive-org mcarchive-org

Environment

Variable Default Purpose
MCARCHIVE_DOWNLOAD_ROOT ./downloads Base directory for download_file

Example flow

search_items(query='mediatype:audio AND creator:"Grateful Dead"', sort=['downloads desc'])
  → identifier 'gd77-05-08.sbd.hicks.4982.sbeok.shnf' (among others)

list_files(identifier='gd77-05-08.sbd.hicks.4982.sbeok.shnf', formats=['VBR MP3'])
  → [{ name: 'gd1977-05-08d1t01.mp3', size: 6342912, md5: '…', download_url: '…' }, …]

download_file(identifier='gd77-…', filename='gd1977-05-08d1t01.mp3', verify_md5='…')
  → { path: './downloads/gd77-…/gd1977-…mp3', bytes: 6342912, md5_ok: True }

Query syntax notes

archive.org uses a Solr/Lucene dialect:

  • mediatype:(audio OR movies) — restrict to media types
  • collection:etree — items in a specific collection
  • date:[1977-01-01 TO 1977-12-31] — date ranges
  • creator:"Grateful Dead" — phrase match
  • -subject:bootleg — exclusion
  • Sort by downloads desc, date asc, addeddate desc, etc.

See archive.org's search docs for the full grammar.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcarchive_org-2026.4.21.1.tar.gz (105.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcarchive_org-2026.4.21.1-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file mcarchive_org-2026.4.21.1.tar.gz.

File metadata

  • Download URL: mcarchive_org-2026.4.21.1.tar.gz
  • Upload date:
  • Size: 105.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"EndeavourOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mcarchive_org-2026.4.21.1.tar.gz
Algorithm Hash digest
SHA256 29dd66e2b6effc1887de61872d16132aad752b6485dc315112dcd8b8282a2f6f
MD5 198a60f95aa53e2c0be9dba95d5b8b6e
BLAKE2b-256 2655af77acf05b7f4b338d9e1e6bf41677b0e89b7973084f1655804a92686b79

See more details on using hashes here.

File details

Details for the file mcarchive_org-2026.4.21.1-py3-none-any.whl.

File metadata

  • Download URL: mcarchive_org-2026.4.21.1-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"EndeavourOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mcarchive_org-2026.4.21.1-py3-none-any.whl
Algorithm Hash digest
SHA256 952c8c12e59f1a3f8d561542bc5131b27994a32c868f45a9146ccf340924c918
MD5 463d97287aae76343cbe320801bf436b
BLAKE2b-256 2b0cfcf1ff0f6e6ed919bfc8fd965c12d3ec9cfc74f646f727d39cfaedbb2657

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page