Aeneas library for ancient text restoration and attribution.

These details have not been verified by PyPI

Project description

Aeneas logo for the project

Contextualising ancient texts with generative neural networks

Yannis Assael^1,*, Thea Sommerschield^2,*, Alison Cooley³, Brendan Shillingford¹, John Pavlopoulos⁴, Priyanka Suresh¹, Bailey Herms⁵, Justin Grayston⁵, Benjamin Maynard⁵, Nicholas Dietrich¹, Robbe Wulgaert⁶, Jonathan Prag⁷, Alex Mullen², Shakir Mohamed¹

¹ Google DeepMind
² University of Nottingham, UK
³ University of Warwick, UK
⁴ Athens University of Economics and Business, Greece
⁵ Google
⁶ Sint-Lievenscollege, Belgium
⁷ University of Oxford, UK

^*Authors contributed equally to this work.

Citation

When using any of the source code or outputs of this project, please cite:

@article{asssome2025contextualising,
  title={Contextualising ancient texts with generative neural networks},
  author={Assael*, Yannis and Sommerschield*, Thea and Cooley, Alison and Pavlopoulos, John and Shillingford, Brendan and Herms, Bailey and Suresh, Priyanka and Maynard, Benjamin and Grayston, Justin and Wulgaert, Robbe and Prag, Jonathan and Mullen, Alex and Mohamed, Shakir},
  journal={Nature},
  volume={643},
  number={8073},
  year={2025},
  publisher={Nature Publishing}
}

Human history is born in writing. Inscriptions, among the earliest written forms, offer direct insights into the thought, language, and history of ancient civilisations. Historians capture these insights by identifying parallels - inscriptions with shared phrasing, function, or cultural setting - to enable the contextualisation of texts within broader historical frameworks, and perform key tasks such as restoration and geographical or chronological attribution. However, current digital methods are restricted to literal matches and narrow historical scopes. We introduce Aeneas, the first generative neural network for contextualising ancient texts. Aeneas retrieves textual and contextual parallels, leverages visual inputs, handles arbitrary-length text restoration, and advances the state-of-the-art in key tasks.

Restoration of damaged inscription
Fragment of a bronze military diploma from Sardinia, issued by the Emperor Trajan to a sailor on a warship. 113/14 CE (CIL XVI, 60, The Metropolitan Museum of Art, Public Domain).

To evaluate its impact, we conduct the largest Historian-AI study to date, with historians considering Aeneas’ retrieved parallels useful research starting points in 90% of cases, improving their confidence in key tasks by 44%. Restoration and geographical attribution tasks yielded superior results when historians were paired with Aeneas, outperforming both humans and AI alone. For dating, Aeneas achieved a 13-year distance from ground-truth ranges. We demonstrate Aeneas’ contribution to historical workflows through analysis of key traits in the Res Gestae Divi Augusti, the most renowned Roman inscription, showing how integrating Science and Humanities can create transformative tools to assist historians and advance our understanding of the past.

Aeneas model architecture diagram
Given the image and textual transcription of an inscription (with damaged sections of unknown-length marked with the "#" character), Aeneas uses a transformer-based decoder, the "torso", to process the text. Specialised networks, called "heads", handle character restoration, date attribution, and geographical attribution (the latter also incorporating visual features). The torso's intermediate representations are merged into a unified, historically-enriched embedding to retrieve similar inscriptions from the LED, ranked by relevance.

References

Aeneas Inference Online

To aid further research in the field we created an online interactive python notebook, where researchers can query one of our trained models to get text restorations, visualise attention weights, and more.

Aeneas Inference Offline

Advanced users who want to perform inference using the trained model may want to do so manually using the predictingthepast library directly.

First, to install the predictingthepast library and its dependencies, run:

pip install .

Then, download the model files.

Latin Model

curl --output aeneas_117149994_2.pkl \
    https://storage.googleapis.com/ithaca-resources/models/aeneas_117149994_2.pkl
curl --output led.json \
    https://storage.googleapis.com/ithaca-resources/models/led.json
curl --output led_emb_xid117149994.pkl \
    https://storage.googleapis.com/ithaca-resources/models/led_emb_xid117149994.pkl

Ancient Greek Model

curl --output ithaca_153143996_2.pkl \
    https://storage.googleapis.com/ithaca-resources/models/ithaca_153143996_2.pkl
curl --output iphi.json \
    https://storage.googleapis.com/ithaca-resources/models/iphi.json
curl --output iphi_emb_xid153143996.pkl \
    https://storage.googleapis.com/ithaca-resources/models/iphi_emb_xid153143996.pkl

Inference Example

An example of using the library can be run via:

python inference_example.py \
    --input_file="example_input.txt" \
    --checkpoint_path="aeneas_117149994_2.pkl" \
    --dataset_path="led.json" \
    --retrieval_path="led_emb_xid117149994.pkl" \
    --language="latin"

This will run restoration and attribution on the text in example_input.txt.

To run it with different input text, use the --input argument:

python inference_example.py \
    --input="..." \
    --checkpoint_path="aeneas_117149994_2.pkl" \
    --dataset_path="led.json" \
    --retrieval_path="led_emb_xid117149994.pkl" \
    --language="latin"

Or use text in a UTF-8 encoded text file:

python inference_example.py \
    --input_file="some_other_input_file.txt" \
    --checkpoint_path="aeneas_117149994_2.pkl" \
    --dataset_path="led.json" \
    --retrieval_path="led_emb_xid117149994.pkl" \
    --language="latin"

The restoration or attribution JSON can be saved to a file:

python inference_example.py \
    --input_file="example_input.txt" \
    --checkpoint_path="aeneas_117149994_2.pkl" \
    --dataset_path="led.json" \
    --retrieval_path="led_emb_xid117149994.pkl" \
    --language="latin" \
    --attribute_json="attribute.json" \
    --restore_json="restore.json"

For full help, run:

python inference_example.py --help

Dataset Generation

For Latin, Aeneas was trained on data from:

Epigraphic Database Roma (EDR)¹: Made available pursuant to a Creative Commons Attribution 4.0 International License (CC-BY) on Zenodo. EDR is also available at edr-edr.it.
Epigraphic Database Heidelberg (EDH)²: Made available pursuant to a Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA) on Zenodo. EDH is also available at edh.ub.uni-heidelberg.de.
ETL repository for Epigraphic Database Clauss Slaby (EDCS_ETL)³: Made available pursuant to a Creative Commons Attribution 4.0 International License (CC-BY) on Zenodo. EDCS_ETL is also available at manfredclauss.de and github.com/sdam-au/EDCS_ETL.

For ancient Greek, Aeneas was trained on Searchable Greek Inscriptions of The Packard Humanities Institute. The processed version is available at: I.PHI dataset.

Training Aeneas

See train/README.md for instructions.

License & Disclaimer

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode

The dataset contains modified data from the Epigraphic Database Heidelberg dataset. That data is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA). You may obtain a copy of the CC-BY-SA license at: https://creativecommons.org/licenses/by-sa/4.0/legalcode.en

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0, CC-BY-SA or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

This is not an official Google product.

Silvio Panciera, Giuseppe Camodeca, Giovanni Cecconi, Silvia Orlandi, Lanfranco Fabriani, & Silvia Evangelisti. (2019). EDR - Epigraphic Database Roma EpiDoc files [Data set]. Zenodo. ↩
James M.S. Cowey, Francisca Feraudi-Gruénais, Brigitte Gräf, Frank Grieshaber, Regine Klar, & Jonas Osnabrügge. (2019). Epigraphic Database Heidelberg EpiDoc files [Data set]. Zenodo. ↩
Heřmánková, P. (2022). EDCS (2.0) [Data set]. Zenodo. ↩

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jul 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

predictingthepast-0.1.0.tar.gz (56.8 kB view details)

Uploaded Jul 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

predictingthepast-0.1.0-py3-none-any.whl (64.5 kB view details)

Uploaded Jul 1, 2026 Python 3

File details

Details for the file predictingthepast-0.1.0.tar.gz.

File metadata

Download URL: predictingthepast-0.1.0.tar.gz
Upload date: Jul 1, 2026
Size: 56.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for predictingthepast-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3f2f1647e4d179547ad4d057a0c9148c316069134dcb6a6f4ec2ba07a0bc754b`
MD5	`55d52df86221b9ce1100ff4b21484d94`
BLAKE2b-256	`05ace5e08e9a1f718e80ad0eb839f6d19d2fabda933ce6848c156834f9c4e409`

See more details on using hashes here.

Provenance

The following attestation bundles were made for predictingthepast-0.1.0.tar.gz:

Publisher: release.yml on google-deepmind/predictingthepast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: predictingthepast-0.1.0.tar.gz
- Subject digest: 3f2f1647e4d179547ad4d057a0c9148c316069134dcb6a6f4ec2ba07a0bc754b
- Sigstore transparency entry: 2040104563
- Sigstore integration time: Jul 1, 2026
Source repository:
- Permalink: google-deepmind/predictingthepast@b978add9efd10a7bd2a5f97b9ec0ec184a2bef9b
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/google-deepmind
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b978add9efd10a7bd2a5f97b9ec0ec184a2bef9b
- Trigger Event: release

File details

Details for the file predictingthepast-0.1.0-py3-none-any.whl.

File metadata

Download URL: predictingthepast-0.1.0-py3-none-any.whl
Upload date: Jul 1, 2026
Size: 64.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for predictingthepast-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fe85f7e3b8bbaf76fd912f8b26ff784cbb08e7791e669a33b00edc7fce1c3b86`
MD5	`5b661382ba5028ace870ddf4214ff167`
BLAKE2b-256	`c8a9c9357af28d86dbb62373aea75afeb499560e68fafb012c0bac4f37f87492`

See more details on using hashes here.

Provenance

The following attestation bundles were made for predictingthepast-0.1.0-py3-none-any.whl:

Publisher: release.yml on google-deepmind/predictingthepast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: predictingthepast-0.1.0-py3-none-any.whl
- Subject digest: fe85f7e3b8bbaf76fd912f8b26ff784cbb08e7791e669a33b00edc7fce1c3b86
- Sigstore transparency entry: 2040104616
- Sigstore integration time: Jul 1, 2026
Source repository:
- Permalink: google-deepmind/predictingthepast@b978add9efd10a7bd2a5f97b9ec0ec184a2bef9b
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/google-deepmind
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b978add9efd10a7bd2a5f97b9ec0ec184a2bef9b
- Trigger Event: release

predictingthepast 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Contextualising ancient texts with generative neural networks

References

Aeneas Inference Online

Aeneas Inference Offline

Latin Model

Ancient Greek Model

Inference Example

Dataset Generation

Training Aeneas

License & Disclaimer

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance