MCP server exposing the Irish National Archives census APIs (1821-1926) to LLM clients.

Project description

irish-census-mcp

An MCP server that exposes the National Archives of Ireland census APIs (1821, 1831, 1841, 1851, 1901, 1911, 1926) to LLM clients, with tools shaped specifically for genealogy queries — finding people, reconstructing households, and tracing the same person across censuses.

Built on FastMCP 2.

What it does

The three underlying APIs cover different periods, different counties, and have three different schemas:

Period	Host	Rows	Coverage
1821 / 1831 / 1841 / 1851	`api-census.nationalarchives.ie` (`/census/query_c19`)	~497k	Surviving pre-Famine fragments only
1901 / 1911	`api-census.nationalarchives.ie` (`/census/query`)	~8.83M	All 32 counties (whole island)
1926	`c26-api.nationalarchives.ie` (`/api/census/query_c26a`)	~2.97M	Free State only (26 counties)

This server hides all three behind six MCP tools that work in terms an LLM-driven family-tree query needs: places, people, households, relatives.

The driving use case

"My great-grandfather was Patrick Murphy from Skibbereen, Co. Cork. His wife was Mary O'Brien, whose mother Catherine was from Strabane, Co. Tyrone. Find me possible relatives."

An LLM client using this server can resolve those places (and know that Tyrone isn't in the 1926 census because it's in Northern Ireland), search for candidates across all years, reconstruct households to find a Mary in Patrick's family, then jump to the O'Brien household in Tyrone to find Catherine and her other children — all in a handful of compact tool calls.

A full investigation of that shape lands at ~3,200 tokens of tool output, leaving the LLM room to reason.

Before you use this — please read

This package is an unofficial client over open APIs operated by the National Archives of Ireland (NAI). The fact that the APIs are unauthenticated does not mean the data is freely reusable, and it does not mean you have unlimited rights to the infrastructure.

What the NAI actually says

The National Archives' Site Usage Policy states that:

"copyright in all content contained in this website remains with the National Archives."

and:

"The contents of this website may be freely accessed and downloaded for personal use."

but also:

"any form of unauthorised reproduction, including the extraction and/or storage in any retrieval system or inclusion in any other computer program or work is prohibited."

For publication (any kind), see Copies, Publication and Copyright and its Permission to Publish sub-page:

"If you wish to use material from the National Archives in publications for commercial or non-commercial purposes you must get permission."

"There is a fee for publication rights."

Attribution must include "the correct document reference code."

What that means for users of this server

Personal genealogical research is explicitly allowed — searching for your own ancestors, browsing the data, downloading scans for your own family tree is fine and that's the main use case here.
Storing the data systematically may not be allowed. The "extraction and storage in any retrieval system or inclusion in any other computer program" clause is broad. Caching a few query results in memory while an LLM works through your question is one thing; building a derivative database is a different thing and almost certainly needs permission.
Publishing any of the records requires permission and a fee — whether commercial or not. This includes blog posts that reproduce scans, articles citing transcribed fields, and websites republishing search results. Get permission first; cite the document reference code.
The data names real people, including living descendants of those enumerated in 1926. Even where copying is legally permitted, treat names, addresses, and family relationships with the care you'd give any genealogical source. Don't publish derived material about identifiable individuals without consent.
This server itself is in murky territory under a strict reading of the policy (it does extract data into another computer program at request time). The argument that it's a permitted use is that it doesn't store anything — every call is a fresh pass-through to the NAI API, and the LLM client decides what to do with the response. If you wrap this server in something that caches, archives, or republishes, the argument gets weaker. If you have any doubt, ask the National Archives directly before you ship.

Infrastructure etiquette

The National Archives is a small public institution running these APIs as a public service. They are not a commercial API provider, and the 1926 release coincides with the centennial — they are seeing genuine research traffic spikes already.

Rate-limit yourself. This server caps per-host concurrency at 4, but if you wrap it in a loop or batch script, add your own throttle. A few requests per second is fine; sustained tens-of-thousands per hour is not.
Cache results client-side when you can. The upstream is CloudFront-cached, but repeated identical queries still travel through the API. ResponseCachingMiddleware is on the roadmap (see Known limitations) but not yet wired up — for now, if you're iterating on the same query, save the JSON locally. (Caching is a grey area under the policy quoted above — keep caches small, ephemeral, and scoped to your current research.)
Don't scrape the whole corpus. The ~12 million rows can in principle be paginated through, but please don't. If you have a legitimate research need for a bulk extract, contact the archive directly.
Identify yourself. This client sends a User-Agent of irish-census-mcp/0.1. If you fork or repackage, use your own identifier so any problems can be traced to you and not here.
Stop on errors. If the API starts returning 429s, 5xxs, or other signs of stress, back off — don't retry tightly. If the National Archives ever asks you to stop or to identify yourself, do so immediately.

If you publish anything that relies on this server, please link to the National Archives 1926 search page so readers can verify against the primary source, and follow the attribution/permission process before reproducing any records.

Quickstart

Requires Python 3.11+.

Easiest: run straight from PyPI with `uvx`

If you have uv installed, no clone needed — the server runs from the published package in one command:

uvx irish-census-mcp

uvx (a shortcut for uv tool run) downloads the package into an isolated environment on first use and runs the entry point. Subsequent runs are cached. Pin a version with uvx irish-census-mcp@0.2.0.

From source (for development)

git clone https://github.com/dmarkey/irish-census-mcp.git
cd irish-census-mcp
uv sync
uv run pytest            # 11 live-API smoke tests
uv run python -m irish_census_mcp

or, via the FastMCP CLI:

uv run fastmcp run src/irish_census_mcp/server.py

Connecting from an MCP client

Claude Desktop / Claude Code

Add to your MCP config (~/.config/claude/mcp_servers.json or whatever your client uses).

Recommended — run from PyPI via uvx (no clone, no venv to manage):

{
  "mcpServers": {
    "irish-census": {
      "command": "uvx",
      "args": ["irish-census-mcp"]
    }
  }
}

Pin a version if you want reproducible behaviour: "args": ["irish-census-mcp@0.2.0"].

Or, from a local clone (useful when you're hacking on the server):

{
  "mcpServers": {
    "irish-census": {
      "command": "uv",
      "args": [
        "--directory", "/path/to/irish-census-mcp",
        "run", "python", "-m", "irish_census_mcp"
      ]
    }
  }
}

Remote HTTP transport

For multi-client deployments, run with the streamable HTTP transport:

uv run fastmcp run src/irish_census_mcp/server.py --transport http --port 8765

Clients then point at http://host:8765/mcp. Be aware: the upstream National Archives APIs are open, so the server doing fan-out is what needs rate protection.

Tools

Six tools, all async, all accept and return plain JSON.

`resolve_place(query) -> list[Place]`

Free-text place → canonical {county, sub_place, available_in, confidence}. The available_in field tells the LLM which censuses cover this county — Northern Ireland counties return [1821..1911] (no 1926).

resolve_place("Strabane Co Tyrone")
# [{"county": "Tyrone", "sub_place": "Strabane",
#   "available_in": [1821, 1831, 1841, 1851, 1901, 1911], "confidence": 0.95}]

`search_people(...) -> SearchResult`

Person search across all three corpora. Filters: surname, first_name, year, county, place, age, age_range, sex, religion, fuzzy, detail, limit, page.

When year="all" (default), results from different censuses that plausibly represent the same person are merged into one row with seen_in: [1901, 1911, 1926] and related_refs pointing at the merged-in records.

search_people(surname="Murphy", first_name="Patrick", county="Cork", year="all", limit=10)
# {
#   "results": [
#     {"ref": "1926:...", "name": "Patrick Murphy", "age": 58,
#      "place": "Townland, DED, Cork",
#      "relation": "Head", "household_key": "1926:...",
#      "seen_in": [1901, 1911, 1926],
#      "related_refs": ["1911:...", "1901:..."]},
#     ...
#   ],
#   "meta": {"page": 0, "more_available": false, ...}
# }

detail="brief" cuts each row in half (~46% fewer tokens) — useful when the LLM is scanning many candidates before drilling in.

`get_household(household_key) -> Household`

Given a household_key from a search row, returns everyone enumerated together plus scan URLs. Members capped at 30 (returns members_truncated: N for institutional records).

get_household("1926:...")
# {
#   "household_key": "1926:...",
#   "year": 1926,
#   "place": "Townland, DED, Cork",
#   "members": [
#     {"ref": "1926:...", "name": "Patrick Murphy",  "age": 58, "relation": "Head"},
#     {"ref": "1926:...", "name": "Mary Murphy",     "age": 54, "relation": "Wife"},
#     {"ref": "1926:...", "name": "Michael Murphy",  "age": 22, "relation": "Son"},
#     {"ref": "1926:...", "name": "Bridget Murphy",  "age": 18, "relation": "Daughter"}
#   ],
#   "scans": {"form_a": ["https://.../image_c26/...pdf"], "form_b": [...]}
# }

`get_person(ref, include_raw=False) -> Person`

Full normalized record for one person. include_raw=True adds the underlying API row (null-stripped) when the LLM needs fields not exposed by the normalized projection — education, deafdumb, children_born for 1911; folio_num, barony for pre-Famine.

`find_relatives(ref, spread=1) -> FamilyTree`

Bundles the common chains around one person:

spread=0 — just the household
spread=1 — + same person in adjacent census years (capped at 3)
spread=2 — + best-guess parent-household candidates (capped at 5, scored)

Parent candidates are heuristic (surname match + plausible parental age band + county match) — the LLM should present them as candidates to verify, not assertions.

`get_scan_url(ref, form="A") -> ScanRef`

Returns the stable API URL for a scan PDF. Forms supported: A, B, B1, B2, N (1901/1911 has all five; 1926 has A and B; pre-Famine has one folio). Each URL 307-redirects to a signed Linode URL valid for 30 minutes — pass the API URL to the user, not the redirect target.

How a real query flows

For the Patrick-Murphy-from-Skibbereen query at the top of this README:

resolve_place("Skibbereen Co Cork") → Cork + sub_place Skibbereen, available in all years.
resolve_place("Strabane Co Tyrone") → Tyrone + sub_place Strabane, available 1821–1911 (not 1926, because Tyrone is in Northern Ireland).
search_people(surname="Murphy", first_name="Patrick", county="Cork", year="all") → candidate Patricks; the dedup collapses cross-census appearances into multi-year rows with seen_in.
For each promising candidate: get_household(household_key) → look for a Mary as wife.
search_people(surname="O'Brien", first_name="Mary", county="Tyrone", year="all") (1926 auto-skipped — Tyrone is NI).
For each Mary O'Brien: get_household → look for Catherine as her mother.
find_relatives(mary_ref, spread=2) → siblings of Mary (maternal aunts/uncles) and possible parent households (maternal grandparents).
get_scan_url(ref) per pivotal record → citation links for the user.

Context-efficiency guarantees

The architecture explicitly budgets for LLM context. Measured response sizes on real queries:

Tool call	~Tokens
`resolve_place`	~50
`search_people` (default, 20 rows)	~700–1,500
`search_people` (`detail="brief"`)	~400–800
`get_household` (3 members)	~310
`get_household` (30 members, capped)	~2,000
`get_person` (default)	~70
`get_person` (`include_raw=True`)	~200
`find_relatives` spread=1	~475
`find_relatives` spread=2	~700

Mechanisms:

Compact-by-default rows (~6 visible fields, not the 25-field raw row).
strip_nulls and strip_internals drop empty/internal keys.
Opaque "<year>:<id>" refs save tokens versus nested objects.
Cross-year dedup with capped related_refs (top 3).
Hard caps: 30 household members, 3 earlier_self, 5 parent candidates.
get_person opt-in raw payload (deep-null-stripped when included).
meta strips zero-count corpora and empty arrays.

A full investigation of the shape above lands at ~3,200 tokens of tool output.

Project layout

irish_census_mcp/
├── README.md                    # this file
├── 1926_CENSUS.md               # 1926 API reference
├── 1901_1911_CENSUS.md          # 1901/1911 API reference
├── PRE_FAMINE_CENSUS.md         # 1821-1851 fragments reference
├── MCP_ARCHITECTURE.md          # design rationale for this server
├── pyproject.toml
├── fastmcp.json                 # FastMCP run configuration
├── uv.lock
├── src/irish_census_mcp/
│   ├── __init__.py
│   ├── __main__.py              # `python -m irish_census_mcp`
│   ├── server.py                # FastMCP instance + 6 @mcp.tool functions
│   ├── gateway.py               # CensusGateway: orchestration, dedup, caps
│   ├── api.py                   # Census1926 / Census19011911 / Census19th
│   ├── places.py                # County list, alias map, fuzzy resolver
│   ├── normalize.py             # Schema flattener (three schemas → one)
│   └── matching.py              # Cross-year dedup + parent-household scoring
└── tests/
    ├── test_live_smoke.py       # 11 tests against the live APIs
    └── test_john_markey_query.py # End-to-end walk-through of the use case

The four *_CENSUS.md files document the underlying APIs directly — read those if you want to call the National Archives endpoints without this MCP server.

Development

uv sync                          # install
uv run pytest                    # full test suite (live API, ~2s)
uv run pytest tests/test_john_markey_query.py -s  # see the walkthrough

The tests hit the real National Archives APIs — they're idempotent reads, no auth required. Set SKIP_LIVE=1 to skip them in CI:

SKIP_LIVE=1 uv run pytest

Verifying the MCP surface

import asyncio
from fastmcp import Client
from irish_census_mcp.server import mcp

async def main():
    async with Client(mcp) as client:
        r = await client.call_tool("resolve_place",
                                   {"query": "Skibbereen Co Cork"})
        print(r.content[0].text)

asyncio.run(main())

Known limitations

Northern Ireland 1926 census is held by PRONI in Belfast (separate jurisdiction) — not available here.
Civil registration (births/marriages/deaths) and church records are not in scope. Those live at irishgenealogy.ie with a different API.
Pre-Famine household reconstruction is best-effort — the underlying schema doesn't have an image_group, so we group by first_image + townland which can miss split households.
Surname phonetic matching is not implemented. fuzzy=True does substring + Levenshtein, but won't catch e.g. O'Brien/OBrien/Brien or Smith/Smyth. Workaround: search each spelling explicitly.
Place-name resolution does not have a townland index. The sub_place field is passed through as __icontains against both townland and ded (or parish for pre-Famine), unioned. Works well for common names but can miss obscure townlands with similar names in neighbouring DEDs.
Parent-household candidate scoring in find_relatives(spread=2) is a heuristic (surname + age band + county). Treat as candidates to verify against scans, not assertions.
No caching middleware yet. FastMCP ships ResponseCachingMiddleware but it's not wired up. Repeated calls will repeatedly hit the API. Each HTTP response is CDN-cached upstream, so it's cheap, but still not free.

Data attribution and reuse

All census data is provided by the National Archives of Ireland. The APIs are open and unauthenticated, but the data is not public domain — check the National Archives' terms-of-use and reuse policies before any use beyond personal genealogical research. See the "Before you use this" section above for specifics.

This server is an unofficial client. It is not endorsed by, affiliated with, or supported by the National Archives of Ireland. Issues with the underlying data should be reported to the archive directly, not to this repository.

The 1926 search interface was launched in 2026 for the centennial of the 1926 census, which had been sealed for 100 years.

Publishing

Releases go to PyPI as irish-census-mcp via a GitHub Actions workflow (.github/workflows/publish.yml) that runs on push of any v* tag.

The workflow uses PyPI Trusted Publishing (OIDC) — no API tokens, no secrets in GitHub. One-time setup on PyPI is required: at https://pypi.org/manage/account/publishing/, add a pending publisher with:

Field	Value
PyPI Project Name	`irish-census-mcp`
Owner	`dmarkey`
Repository name	`irish-census-mcp`
Workflow name	`publish.yml`
Environment name	`pypi`

Cutting a release after that is just:

# 1. Bump the version in pyproject.toml
# 2. Commit
git commit -am "Release v0.2.0"
# 3. Tag and push
git tag v0.2.0
git push origin main --tags

The workflow validates that the tag matches pyproject.toml's version before publishing, so a forgotten bump fails fast rather than shipping the wrong artefact. Manual workflow_dispatch runs build and verify but skip the publish step (no tag → nothing to ship).

License

This licence covers only the code in this repository. It does not cover the underlying census data, which remains under the National Archives of Ireland's terms — see "Before you use this".

Project details

Release history Release notifications | RSS feed

0.3.0

May 10, 2026

This version

0.2.0

May 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

irish_census_mcp-0.2.0.tar.gz (121.4 kB view details)

Uploaded May 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

irish_census_mcp-0.2.0-py3-none-any.whl (27.8 kB view details)

Uploaded May 10, 2026 Python 3

File details

Details for the file irish_census_mcp-0.2.0.tar.gz.

File metadata

Download URL: irish_census_mcp-0.2.0.tar.gz
Upload date: May 10, 2026
Size: 121.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for irish_census_mcp-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`b736795f75e54122420a00840616f5db55e7655ab3a4691994130f7df1a0b97e`
MD5	`a7bd17fab38f707b850f41d9bf887699`
BLAKE2b-256	`3ec3a8844e08c7b2a0702bb44391024c64f04c6339f8db1164e6d783d30e89cd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for irish_census_mcp-0.2.0.tar.gz:

Publisher: publish.yml on dmarkey/irish-census-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: irish_census_mcp-0.2.0.tar.gz
- Subject digest: b736795f75e54122420a00840616f5db55e7655ab3a4691994130f7df1a0b97e
- Sigstore transparency entry: 1498721949
- Sigstore integration time: May 10, 2026
Source repository:
- Permalink: dmarkey/irish-census-mcp@4fa6d2443273e20711529a784864c5e43e64fedc
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/dmarkey
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4fa6d2443273e20711529a784864c5e43e64fedc
- Trigger Event: push

File details

Details for the file irish_census_mcp-0.2.0-py3-none-any.whl.

File metadata

Download URL: irish_census_mcp-0.2.0-py3-none-any.whl
Upload date: May 10, 2026
Size: 27.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for irish_census_mcp-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6f46584cd7071f68b119edd4988ed0a154cbd64b926f92427062643dcdae2680`
MD5	`65e574554d85a45721781cbcd5698729`
BLAKE2b-256	`5b5bdc2cf39230ada39552f83bd3af328bfd05532e7f61eb39da48e96326acd0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for irish_census_mcp-0.2.0-py3-none-any.whl:

Publisher: publish.yml on dmarkey/irish-census-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: irish_census_mcp-0.2.0-py3-none-any.whl
- Subject digest: 6f46584cd7071f68b119edd4988ed0a154cbd64b926f92427062643dcdae2680
- Sigstore transparency entry: 1498722066
- Sigstore integration time: May 10, 2026
Source repository:
- Permalink: dmarkey/irish-census-mcp@4fa6d2443273e20711529a784864c5e43e64fedc
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/dmarkey
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4fa6d2443273e20711529a784864c5e43e64fedc
- Trigger Event: push

irish-census-mcp 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

irish-census-mcp

What it does

The driving use case

Before you use this — please read

What the NAI actually says

What that means for users of this server

Infrastructure etiquette

Quickstart

Easiest: run straight from PyPI with uvx

From source (for development)

Connecting from an MCP client

Claude Desktop / Claude Code

Remote HTTP transport

Tools

resolve_place(query) -> list[Place]

search_people(...) -> SearchResult

get_household(household_key) -> Household

get_person(ref, include_raw=False) -> Person

find_relatives(ref, spread=1) -> FamilyTree

get_scan_url(ref, form="A") -> ScanRef

How a real query flows

Context-efficiency guarantees

Project layout

Development

Verifying the MCP surface

Known limitations

Data attribution and reuse

Publishing

License

See also

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Easiest: run straight from PyPI with `uvx`

`resolve_place(query) -> list[Place]`

`search_people(...) -> SearchResult`

`get_household(household_key) -> Household`

`get_person(ref, include_raw=False) -> Person`

`find_relatives(ref, spread=1) -> FamilyTree`

`get_scan_url(ref, form="A") -> ScanRef`