Skip to main content

An MCP server that exposes the corpus of Atatürk's speeches, statements, telegrams and proclamations (1906-1938) for researchers worldwide. Created by bugraayan.com.

Project description

Atatürk MCP

Created by Buğra Ayan — bugraayan.com

Tüm dünyadan Atatürk üzerine araştırma yapanlar için açık kaynak bir köprü.

A Model Context Protocol server that exposes the complete corpus of Mustafa Kemal Atatürk's speeches, statements, telegrams and proclamations (1906-1938) to any MCP-aware AI client (Claude Desktop, Cursor, Cline, Windsurf, …).

Built for researchers, historians, journalists and students anywhere in the world who want to ask LLMs questions like:

  • "What did Atatürk say about women's rights in 1923?"
  • "Find quotes about education and modernisation."
  • "Show me his opening address to the Grand National Assembly on 1 March 1922."
  • "Compare the 1927 Nutuk's treatment of the War of Independence with his 1933 10th Year Speech."

Corpus

Source Coverage Format Speeches
ATAM — Atatürk Araştırma Merkezi "Söylev ve Demeçleri" Cilt I-III (2024 edition) 1906-1938, the definitive corpus PDF 366
Vikikaynak Individual speeches, telegrams, all TBMM opening addresses (1920-1938) HTML 45
Internet Archive — Nutuk (English) The 1927 Nutuk in English PDF / DjVu 20 chapters
Total 1906-1938 SQLite + FTS5 411 speeches + Nutuk

Atatürk passed away in 1938; his words are in the public domain. Editorial annotations from the ATAM edition are not redistributed — only the speech bodies themselves, with a source_ref pointing back to the corresponding ATAM volume and page number for academic citation.

Quick start

1. Install

From PyPI (recommended for end users):

pip install ataturk-mcp                # runtime only
pip install "ataturk-mcp[etl]"         # also includes the ETL scripts

From source (for hacking on the ETL):

git clone https://github.com/bugraayan/ataturk-mcp.git
cd ataturk-mcp
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[etl]"

2. Build the database (one-off, ~1 minute)

python scripts/fetch_atam.py            # ~10 MB of PDFs from atam.gov.tr
python scripts/fetch_wikisource.py      # MediaWiki API, ~45 pages
python scripts/fetch_nutuk_en.py        # ~2 MB DjVu text from Internet Archive
python scripts/build_db.py              # produces data/speeches.db (~11 MB)

Or, if you only want the core corpus:

python scripts/fetch_atam.py
python scripts/build_db.py --skip-wikisource --skip-nutuk

A pre-built speeches.db is also published on the GitHub Releases page so end users do not need to run the ETL themselves.

3. Run the MCP server

ataturk-mcp                  # stdio transport (default, used by all MCP clients)
# or
fastmcp run src/ataturk_mcp/server.py:mcp
# or
python -m ataturk_mcp.server

Connecting to MCP clients

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or the equivalent on your platform and add:

{
  "mcpServers": {
    "ataturk": {
      "command": "/absolute/path/to/.venv/bin/ataturk-mcp"
    }
  }
}

If your database lives somewhere other than the repo, set the path explicitly:

{
  "mcpServers": {
    "ataturk": {
      "command": "/absolute/path/to/.venv/bin/ataturk-mcp",
      "env": {
        "ATATURK_MCP_DB": "/path/to/speeches.db"
      }
    }
  }
}

Restart Claude Desktop; the hammer icon now exposes the Atatürk tools.

Cursor

Edit ~/.cursor/mcp.json (or the project-level .cursor/mcp.json):

{
  "mcpServers": {
    "ataturk": {
      "command": "/absolute/path/to/.venv/bin/ataturk-mcp"
    }
  }
}

Cline / Continue / any MCP host

Use the same command line with whichever JSON the host expects. The server speaks stdio by default and follows the MCP 2024-11-05 spec.

Tools exposed

Tool Purpose
search_speeches(query, lang, year_from, year_to, kind, limit) Full-text BM25 search over the entire corpus, with snippets. Diacritic-insensitive; supports FTS5 operators (AND, OR, NEAR, "phrases", prefix*).
get_speech(speech_id, lang) Return one speech in full, in Turkish or English.
list_speeches(year, year_from, year_to, kind, source, limit, offset) Browse the corpus chronologically.
random_speech(lang, kind) Pick a random speech (useful for daily-quote agents).
list_topics() / speeches_by_topic(topic_id, limit) Topical browsing (when ATAM Konular İndeksi is loaded).
nutuk_search(query, lang, limit) Search within Nutuk (1927).
nutuk_chapter(chapter, lang) Return one Nutuk chapter (1-20).
cite(speech_id) Generate APA / MLA / Chicago citations.
corpus_stats() Summary statistics about the corpus.

Resources

URI Description
ataturk://speech/{speech_id} Plain-text rendering of a single speech with header.
ataturk://nutuk/{chapter}/{lang} One Nutuk chapter.
ataturk://corpus/stats Statistics as JSON.

Prompts

Prompt Use
analyze_speech(speech_id) Scholarly analysis template (context, rhetoric, themes, citation).
find_quote(theme, n_quotes) Theme-based quote hunter across the corpus.

Development

pip install -e ".[etl,dev]"
pytest -q
ruff check .

The test suite uses an in-memory seeded SQLite DB and FastMCP's in-process client, so it runs in under a second and does not require the production DB to be built.

Architecture

ATAM PDFs ─┐
Vikikaynak ─┼─► ETL scripts ─► SQLite (FTS5) ─► FastMCP stdio server ─► AI clients
Nutuk EN ──┘                  speeches.db
  • ETL (scripts/) is fully decoupled from the server (src/ataturk_mcp/).
  • The server opens the DB read-only and is safe to run in parallel from multiple clients.
  • Turkish search quality: FTS5 with unicode61 remove_diacritics 2 plus application-level I/İ normalisation in db.normalise_query.

Publishing to PyPI

The project is wired to publish from GitHub Actions (see .github/workflows/release.yml) when you push a tag of the form vX.Y.Z. For manual publishing:

pip install build twine
python -m build                          # produces dist/*.whl and dist/*.tar.gz
twine check dist/*
twine upload dist/*                      # uploads to https://pypi.org/project/ataturk-mcp/
# or for a dry run on TestPyPI:
twine upload --repository testpypi dist/*

Credits & author

  • Author: Buğra Ayan — bugraayan.com
  • Email: hello@bugraayan.com
  • Atatürk Araştırma Merkezi (ATAM) — for digitising and editing the Söylev ve Demeçleri corpus, the gold-standard source for this project.
  • Vikikaynak / Türkçe Wikisource contributors — for transcribing individual addresses and the TBMM opening speeches.
  • Internet Archive — for hosting the public-domain English Nutuk.

If you build research or journalism on top of this server, please cite both Atatürk's words (via the in-tool cite command) and the original sources (ATAM volume + page, or the Wikisource URL).

License

MIT for the code (© Buğra Ayan / bugraayan.com). The speech texts themselves are in the public domain (Atatürk died in 1938). ATAM, TBMM and Wikisource are credited as the digital sources used to build the corpus; please respect each source's terms when redistributing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ataturk_mcp-0.1.0.tar.gz (39.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ataturk_mcp-0.1.0-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file ataturk_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: ataturk_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 39.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ataturk_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9877dc3da1a8034c9edc99ba93f59a10528ebafaa0ace26d21b087a9ac093c3d
MD5 a3b36f2482e2715080360b64009dd1d9
BLAKE2b-256 ffafddd3e130eda1ea4175173dc47752a4474f26fb1fa5a92bad1ba03de83270

See more details on using hashes here.

File details

Details for the file ataturk_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ataturk_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ataturk_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6f4b13ae7803efddb4a71992fa41ba546f99373552ee7a095e00966f77fa0bb7
MD5 bb867694e86804ef49339e1d8846f0cc
BLAKE2b-256 9302aebc3c9f5f64760e90f136533f8d82cd03dfeb2aff23b8252ce84de9afba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page