A specialist Python toolkit for Ancient Greek — alphabetic Greek and the Aegean syllabic scripts (Linear A/B).

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

pyaegean

A specialist Python toolkit for Ancient Greek — alphabetic Greek and the Aegean syllabic scripts (Linear A / Linear B). pyaegean focuses narrowly and deeply on Greek and the Aegean world: a script-agnostic corpus data layer, the analytical methods from the Linear A Research Workbench, translation, and a pluggable multi-provider AI layer. The excellent CLTK already serves many ancient languages broadly; pyaegean is intentionally narrower, and uses CLTK as a friendly benchmark to measure its Greek coverage against.

Status: v0.2.0 (alpha). The script-agnostic core, Linear A, the full Greek NLP track (incl. opt-in Perseus-treebank lemmas/POS, LSJ glossing, a baseline dependency parser, and a CLTK benchmark harness), and the multi-provider AI layer are all implemented. Analytical output on the undeciphered Linear A material is exploratory — see the methodology/limitations.

Install

pip install pyaegean            # core + Linear A + Greek
pip install "pyaegean[ai]"      # + Anthropic / OpenAI / Grok / Gemini clients
pip install "pyaegean[all]"     # everything

New to Python, or not a programmer? You're exactly who this tool is for. The Getting Started guide walks you from "I have nothing installed" to your first result — no prior coding assumed.

Quick start

Prefer to learn by doing? Run the guided tour in your browser — nothing to install:

import aegean

corpus = aegean.load("lineara")          # 1,721 inscriptions, bundled, offline
print(len(corpus))                       # 1721

ht = corpus.filter(site="Haghia Triada") # filter by metadata (full site name)
df = corpus.to_dataframe(level="word")   # pandas-native, one row per word

from aegean.analysis import balance_check, word_matches_sign_pattern
checks = balance_check(corpus.get("HT13"))          # KU-RO accounting reconciliation
hits = [w for w, _ in corpus.word_frequencies()
        if word_matches_sign_pattern(w, "KU-*-RO")] # wildcard sign search

And a taste of the Greek pipeline:

from aegean import greek

greek.betacode_to_unicode("mh=nin")     # 'μῆνιν'   (type Greek in plain ASCII)
greek.syllabify("ἄνθρωπος")             # ['ἄν', 'θρω', 'πος']
greek.scan_hexameter("ἄνδρα μοι ἔννεπε, Μοῦσα, πολύτροπον, ὃς μάλα πολλὰ").pattern
# '—⏑⏑|—⏑⏑|—⏑⏑|—⏑⏑|—⏑⏑|—×'             (Odyssey 1.1)
[str(a) for a in greek.analyze("λόγον")][:2]
# ['λόγος [NOUN acc sg masc]', 'λόγος [NOUN acc sg fem]']

The full Linear A facsimile mirror (3,368 images, ~116 MB) is not bundled; fetch it on demand: aegean.data.fetch("lineara-images") (downloaded from the workbench repo, sha256-verified, cached locally — never re-hosted). The opt-in Greek backends likewise fetch large CC BY-SA assets to cache on first use (never bundled): the Perseus AGDT treebank (~75 MB, greek.use_treebank()) and the full Perseus LSJ (~270 MB, greek.use_lsj()).

What's here

aegean.core — script-agnostic model: Corpus, Document, Token, Sign, SignInventory, Numeral, the Script plugin registry, provenance.
aegean.scripts.lineara — Linear A: bundled corpus + 84-sign inventory + sign→sound map + transliteration.
aegean.analysis — ported from the workbench: accounting reconciliation, wildcard sign-pattern search, weighted phonetic distance + alignment, morphology clustering, collocation statistics, a compound-query engine, and heuristic tablet-structure classification (all with golden-fixture parity).
aegean.greek — the Greek NLP track: Unicode/Beta Code normalization, word/sentence tokenization, syllabification, accent and prosody analysis, metrical scansion (dactylic hexameter + elegiac pentameter), reconstructed IPA, POS tagging, a rule-based morphological analyzer (with an optional Perseus-treebank–backed lexicon for attested, accented lemmas), baseline lemmatization, opt-in LSJ glossing (use_lsj → gloss/lookup), an opt-in baseline dependency parser (use_parser → parse; ~0.67 UAS / 0.57 LAS on projective AGDT), and a CLTK benchmark harness (the opt-in treebank lifts lemma 28%→100% and POS 50%→100% on the gold set). aegean.load("greek") loads a small bundled sample corpus (Archaic→Koine).
aegean.data — bundled-data access + download-to-cache for large assets.
aegean.ai — multi-provider AI layer: a provider-agnostic LLMClient (Anthropic default, plus OpenAI, xAI Grok, Gemini — SDKs optional), response caching, corpus grounding, and capabilities (translate, gloss, decipherment hypotheses, NLP-assist, ask/summarize). Every generative result is labeled exploratory with provenance. aegean.translate is the hybrid lexicon+LLM front end.

Documentation

Full documentation lives in the project wiki:

Getting Started — for newcomers to Python
Example notebook — a runnable guided tour (open in Colab)
Tutorial — two guided, end-to-end research walkthroughs
Linear A · Analysis · Greek NLP · AI Layer — reference per domain
Data & Provenance · FAQ

Roadmap

Shipped: v0.1 core + Linear A + Greek start. v0.2 (current): multi-provider AI layer + translation and deep Greek NLP — Perseus-treebank lemmas/POS, LSJ glossing, a baseline dependency parser, and a CLTK benchmark harness. Next: v0.3 grow the gold/corpus + a live CLTK head-to-head → v0.4 Linear B (DAMOS/LiBER) → v0.5 Cypriot/Cypro-Minoan → v1.0 stable.

License

Apache-2.0. Corpus data is GORILA (Godart & Olivier 1976–1985) via mwenge/lineara.xyz; facsimile imagery © École Française d'Athènes (referenced, not redistributed). The opt-in Greek backends fetch the Perseus AGDT treebank (CC BY-SA 3.0) and Perseus LSJ (CC BY-SA 4.0) to cache — built locally, never bundled or re-hosted. See NOTICE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ryanpavlicek

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.8.0

Jun 12, 2026

0.7.0

Jun 10, 2026

0.6.0

Jun 10, 2026

0.5.0

Jun 10, 2026

0.4.0

Jun 10, 2026

0.3.0

Jun 10, 2026

This version

0.2.0

Jun 8, 2026

0.1.0

Jun 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyaegean-0.2.0.tar.gz (187.7 kB view details)

Uploaded Jun 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyaegean-0.2.0-py3-none-any.whl (180.5 kB view details)

Uploaded Jun 8, 2026 Python 3

File details

Details for the file pyaegean-0.2.0.tar.gz.

File metadata

Download URL: pyaegean-0.2.0.tar.gz
Upload date: Jun 8, 2026
Size: 187.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyaegean-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`615550444252d2a3b9776e41170f5fbacc0aed1f58823f9f4a338a0afa997a70`
MD5	`7d911e5f856437be3ac87cd0385dee53`
BLAKE2b-256	`718e685aa2cb1340898819037b480b6260a861a01b2e717b0a19fcecc71f6533`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyaegean-0.2.0.tar.gz:

Publisher: release.yml on ryanpavlicek/pyaegean

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyaegean-0.2.0.tar.gz
- Subject digest: 615550444252d2a3b9776e41170f5fbacc0aed1f58823f9f4a338a0afa997a70
- Sigstore transparency entry: 1758611913
- Sigstore integration time: Jun 8, 2026
Source repository:
- Permalink: ryanpavlicek/pyaegean@73f97d03670d03acd23336ce3f0b8eaf47c51fae
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/ryanpavlicek
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@73f97d03670d03acd23336ce3f0b8eaf47c51fae
- Trigger Event: release

File details

Details for the file pyaegean-0.2.0-py3-none-any.whl.

File metadata

Download URL: pyaegean-0.2.0-py3-none-any.whl
Upload date: Jun 8, 2026
Size: 180.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyaegean-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`08449fcfe5611849894fb4d5d2c76a1442a84f030e3e0ee4139081196e5478a0`
MD5	`fdb80c43df6add883960f42e73be3ea8`
BLAKE2b-256	`1135803364d6869be19ad50a57ca107c049912a99d49da1659b21ce901f1f564`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyaegean-0.2.0-py3-none-any.whl:

Publisher: release.yml on ryanpavlicek/pyaegean

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyaegean-0.2.0-py3-none-any.whl
- Subject digest: 08449fcfe5611849894fb4d5d2c76a1442a84f030e3e0ee4139081196e5478a0
- Sigstore transparency entry: 1758611945
- Sigstore integration time: Jun 8, 2026
Source repository:
- Permalink: ryanpavlicek/pyaegean@73f97d03670d03acd23336ce3f0b8eaf47c51fae
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/ryanpavlicek
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@73f97d03670d03acd23336ce3f0b8eaf47c51fae
- Trigger Event: release

pyaegean 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

pyaegean

Install

Quick start

What's here

Documentation

Roadmap

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance