A specialist Python toolkit for Ancient Greek — alphabetic Greek and the Aegean syllabic scripts (Linear A/B).
Project description
pyaegean
A specialist Python toolkit for Ancient Greek — alphabetic Greek and the Aegean syllabic scripts (Linear A / Linear B). pyaegean focuses narrowly and deeply on Greek and the Aegean world: a script-agnostic corpus data layer, the analytical methods from the Linear A Research Workbench, translation, and a pluggable multi-provider AI layer. The excellent CLTK already serves many ancient languages broadly; pyaegean is intentionally narrower, and uses CLTK as a friendly benchmark to measure its Greek coverage against.
Status: v0.1 (alpha). Script-agnostic core + Linear A fully implemented; the Greek NLP track and the AI layer are landing across v0.1–v0.2. See the roadmap. Analytical output on the undeciphered Linear A material is exploratory — see the methodology/limitations.
Install
pip install pyaegean # core + Linear A + Greek
pip install "pyaegean[ai]" # + Anthropic / OpenAI / Grok / Gemini clients
pip install "pyaegean[all]" # everything
Not on PyPI yet. Until the first release, install from source instead:
pip install git+https://github.com/ryanpavlicek/pyaegean
New to Python, or not a programmer? You're exactly who this tool is for. The Getting Started guide walks you from "I have nothing installed" to your first result — no prior coding assumed.
Quick start
Prefer to learn by doing? Run the guided tour in your browser — nothing to install:
import aegean
corpus = aegean.load("lineara") # 1,721 inscriptions, bundled, offline
print(len(corpus)) # 1721
ht = corpus.filter(site="Haghia Triada") # filter by metadata (full site name)
df = corpus.to_dataframe(level="word") # pandas-native, one row per word
from aegean.analysis import balance_check, word_matches_sign_pattern
checks = balance_check(corpus.get("HT13")) # KU-RO accounting reconciliation
hits = [w for w, _ in corpus.word_frequencies()
if word_matches_sign_pattern(w, "KU-*-RO")] # wildcard sign search
And a taste of the Greek pipeline:
from aegean import greek
greek.betacode_to_unicode("mh=nin") # 'μῆνιν' (type Greek in plain ASCII)
greek.syllabify("ἄνθρωπος") # ['ἄν', 'θρω', 'πος']
greek.scan_hexameter("ἄνδρα μοι ἔννεπε, Μοῦσα, πολύτροπον, ὃς μάλα πολλὰ").pattern
# '—⏑⏑|—⏑⏑|—⏑⏑|—⏑⏑|—⏑⏑|—×' (Odyssey 1.1)
[str(a) for a in greek.analyze("λόγον")][:2]
# ['λόγος [NOUN acc sg masc]', 'λόγος [NOUN acc sg fem]']
The full Linear A facsimile mirror (3,368 images, ~116 MB) is not bundled;
fetch it on demand: aegean.data.fetch("lineara-images") (downloaded from the
workbench repo, sha256-verified, cached locally — never re-hosted).
What's here (v0.1)
aegean.core— script-agnostic model:Corpus,Document,Token,Sign,SignInventory,Numeral, theScriptplugin registry, provenance.aegean.scripts.lineara— Linear A: bundled corpus + 84-sign inventory + sign→sound map + transliteration.aegean.analysis— ported from the workbench: accounting reconciliation, wildcard sign-pattern search, weighted phonetic distance + alignment, morphology clustering, collocation statistics, a compound-query engine, and heuristic tablet-structure classification (all with golden-fixture parity).aegean.greek— the Greek NLP track: Unicode/Beta Code normalization, word/sentence tokenization, syllabification, accent and prosody analysis, metrical scansion (dactylic hexameter + elegiac pentameter), reconstructed IPA, POS tagging, a rule-based morphological analyzer (with an optional Perseus-treebank–backed lexicon for attested, accented lemmas), and baseline lemmatization.aegean.load("greek")loads a small bundled sample corpus (Archaic→Koine).aegean.data— bundled-data access + download-to-cache for large assets.aegean.ai(v0.2) — multi-provider AI layer: a provider-agnosticLLMClient(Anthropic default, plus OpenAI, xAI Grok, Gemini — SDKs optional), response caching, corpus grounding, and capabilities (translate, gloss, decipherment hypotheses, NLP-assist, ask/summarize). Every generative result is labeled exploratory with provenance.aegean.translateis the hybrid lexicon+LLM front end.
Documentation
Full documentation lives in the project wiki:
- Getting Started — for newcomers to Python
- Example notebook — a runnable guided tour (open in Colab)
- Tutorial — two guided, end-to-end research walkthroughs
- Linear A · Analysis · Greek NLP · AI Layer — reference per domain
- Data & Provenance · FAQ
Roadmap
v0.1 core + Linear A (+ Greek start) → v0.2 AI layer (multi-provider) + translation → v0.3 deep Greek NLP (benchmarked against CLTK) → v0.4 Linear B (DAMOS/LiBER) → v0.5 Cypriot/Cypro-Minoan → v1.0 stable.
License
Apache-2.0. Corpus data is GORILA (Godart & Olivier 1976–1985) via
mwenge/lineara.xyz; facsimile imagery © École Française d'Athènes (referenced,
not redistributed). See NOTICE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyaegean-0.1.0.tar.gz.
File metadata
- Download URL: pyaegean-0.1.0.tar.gz
- Upload date:
- Size: 175.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e61d57b05f74bbceb54ac397b28dd1217a5962d0369a13a98b4e27013dcd1ba8
|
|
| MD5 |
68de432b459f4904fb0e2c3fb381c35d
|
|
| BLAKE2b-256 |
d543b9cf4b0eb4ed85eb741bbd149311bab20f84259eea03147634ed969cb757
|
Provenance
The following attestation bundles were made for pyaegean-0.1.0.tar.gz:
Publisher:
release.yml on ryanpavlicek/pyaegean
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyaegean-0.1.0.tar.gz -
Subject digest:
e61d57b05f74bbceb54ac397b28dd1217a5962d0369a13a98b4e27013dcd1ba8 - Sigstore transparency entry: 1757751680
- Sigstore integration time:
-
Permalink:
ryanpavlicek/pyaegean@fafa5fe0209cec283df4ca2a339606488cbb72ae -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ryanpavlicek
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fafa5fe0209cec283df4ca2a339606488cbb72ae -
Trigger Event:
release
-
Statement type:
File details
Details for the file pyaegean-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pyaegean-0.1.0-py3-none-any.whl
- Upload date:
- Size: 168.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73e14aaf8edec79bd09c5bc38969d5a3ab140d7a1c5a47a8cf8ad307a94cf607
|
|
| MD5 |
4e1f177c9d28f02fe3db7ebd70ec34bf
|
|
| BLAKE2b-256 |
62b0e602bfa23e32f58af22fc6a30d31476ed6ac9238eeb76459c097fae552a3
|
Provenance
The following attestation bundles were made for pyaegean-0.1.0-py3-none-any.whl:
Publisher:
release.yml on ryanpavlicek/pyaegean
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyaegean-0.1.0-py3-none-any.whl -
Subject digest:
73e14aaf8edec79bd09c5bc38969d5a3ab140d7a1c5a47a8cf8ad307a94cf607 - Sigstore transparency entry: 1757751996
- Sigstore integration time:
-
Permalink:
ryanpavlicek/pyaegean@fafa5fe0209cec283df4ca2a339606488cbb72ae -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ryanpavlicek
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fafa5fe0209cec283df4ca2a339606488cbb72ae -
Trigger Event:
release
-
Statement type: