Skip to main content

Translating Akkadian signs to transliteration using NLP algorithms

Project description

Translating-Akkadian-using-NLP

Translating Akkadian signs to transliteration using NLP algorithms such as HMM, MEMM and BiLSTM neural networks.

Getting Started

There are 3 main ways to deploy the project:

  • Website
  • Python package
  • Github clone

Website

Use this link to access the website: https://babylonian.herokuapp.com/#/

Go to "Translit" tab and enter signs to see them transliterated.

Python Package

These instructions will enable you to use the project on your local machine for transliterating using "akkadian" python package that is based on our project.

Prerequisites

Install Python 3.6 or 3.7 - Link for example (version 3.7.1): https://www.python.org/downloads/release/python-371/.

Installing

Install akkadian package. One way to do so is using pip:

pip install akkadian

Running

Following are a few examples for running sessions.

Tranliterating akkadian signs:

import akkadian.transliterate as akk
print(akk.transliterate("๐’น๐’€ญ๐’Œ๐’‹€๐’ˆจ๐’Œ๐’Œท๐’€"))

Tranliterating akkadian signs using BiLSTM:

import akkadian.transliterate as akk
print(akk.transliterate_bilstm("๐’น๐’€ญ๐’Œ๐’‹€๐’ˆจ๐’Œ๐’Œท๐’€"))

Top three options of tranliterating akkadian signs using BiLSTM:

import akkadian.transliterate as akk
print(akk.transliterate_bilstm_top3("๐’น๐’€ญ๐’Œ๐’‹€๐’ˆจ๐’Œ๐’Œท๐’€"))

Tranliterating akkadian signs using MEMM:

import akkadian.transliterate as akk
print(akk.transliterate_memm("๐’น๐’€ญ๐’Œ๐’‹€๐’ˆจ๐’Œ๐’Œท๐’€"))

Tranliterating akkadian signs using HMM:

import akkadian.transliterate as akk
print(akk.transliterate_hmm("๐’น๐’€ญ๐’Œ๐’‹€๐’ˆจ๐’Œ๐’Œท๐’€"))

Github

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Install Python 3.6 or 3.7 - Link for example (version 3.7.1): https://www.python.org/downloads/release/python-371/.

If you don't have git installed, install git - https://git-scm.com/downloads (Choose the appropriate operating system).

If you don't have a Github user, create one - https://github.com/join?source=header-home.

Installing the python dependencies

Install torch: Windows -

pip install torch===1.3.1 torchvision===0.4.2 -f https://download.pytorch.org/whl/torch_stable.html

Linux and MAC -

pip install torch torchvision

Install allennlp:

pip install allennlp==0.8.5

Cloning the project

Clone the project:

git clone https://github.com/gaigutherz/Translating-Akkadian-using-NLP.git

Running

Now you can develop for the Translating-Akkadian-using-NLP repository and and your improvements!

Training

Use the file train.py in order to train the models using the datasets. There is a function for each model that trains, stores the pickle and tests its performance on a specific corpora.

The functions are as follows:

hmm_train_and_test(corpora)
memm_train_and_test(corpora)
biLSTM_train_and_test(corpora)

Transliterating

Use the file transliterate.py in order to transliterate using the models. There is a function for each model that gets a sentence of Akkadian signs as parameter and returns its transliteration.

Example of usage:

akkadian_signs = "๐’น๐’€ญ๐’Œ๐’‹€๐’ˆจ๐’Œ๐’Œท๐’€"
print(transliterate(akkadian_signs))
print(transliterate_bilstm(akkadian_signs))
print(transliterate_bilstm_top3(akkadian_signs))
print(transliterate_hmm(akkadian_signs))
print(transliterate_memm(akkadian_signs))

Datasets

The main datasets used for training and tests are:

Dataset King Time Line Number Percentage of Corpora
RINAP 1 Tiglath-pileser III and Shalmaneser V 744-722 BC 1125 4.78%
RINAP 3 Sennacherib 704-681 BC 7131 30.31%
RINAP 4 Esarhaddon 680-669 BC 6018 25.58%
RINAP 5 Ashurbanipal and Successors 668-612 BC 9252 39.33%

More datasets used:

  • RIAO - This project intends to present annotated editions of the entire corpus of Assyrian royal inscriptions, texts that were published in RIMA 1-3.

  • RIBO - This project intends to present annotated editions of the entire corpus of Babylonian royal inscriptions from the Second Dynasty of Isin to the Neo-Babylonian Dynasty (1157-539 BC).

  • SAAO - The online counterpart to the State Archives of Assyria series.

  • SUHU - This project presents annotated editions of the officially commissioned texts of the extant, first-millennium-BC inscriptions of the rulers of Suhu, texts published in Frame, RIMB 2 pp. 275-331.

  • TEI - Databases used for full translation.

Datasets deployment

The datasets are taken from ORACC project and can be downloaded from the following link: http://oracc.museum.upenn.edu/rinap/rinapdownloads/index.html.

In our repository the datasets are located in the "raw_data" directory. They can be also downloaded from the Github repository using git clone or zip download.

Project structure

BiLSTM_input:

Contains  dictionaries used for transliteration by BiLSTM.

NMT_input:

Contains dictionaries used for natural machine translation.

akkadian.egg-info:

Inforamtion  and settings for akkadian python package.

akkadian:

Sources and train's output.

output:	Train's output for HMM, MEMM and BiLSTM - mostly pickles.

__init__.py: Init script for akkadian python package. Initializes global variables.

bilstm.py:  Class for BiLSTM train and prediction using AllenNLP implementation.

build_data.py: Code for organizing the data in dictionaries.

check_translation.py: Code for translation accuracy checking.

combine_algorithms.py: Code for prediction using both HMM, MEMM and BiLSTM.

data.py: Utils for accuracy checks and dictionaries interpretations.

full_translation_build_data.py: Code for organizing the data for full translation task.

get_texts_details.py: Util for getting more information about the text.

hmm.py: Implementation of HMM for train and prediction.

memm.py: Implementation of MEMM for train and prediction.

parse_json: Json parsing used for data organizing.

parse_xml.py: XML parsing used for data organizing.

train.py: API for training all 3 algorithms and store the output.

translation_tokenize.py: Code for tokenization for translation task.

transliterate.py: API for transliterating using all 3 algorithms.

build/lib/akkadian:

Inforamtion  and settings for akkadian python package.

dist:

Akkadian python package - wheel and tar.

raw_data:

Databases used for  training the models.

random: 4 Texts used for cross era testing.

riao: This project intends to present annotated editions of the entire corpus of Assyrian royal inscriptions, texts that were published in RIMA 1-3.

ribo: This project intends to present annotated editions of the entire corpus of Babylonian royal inscriptions from the Second Dynasty of Isin to the Neo-Babylonian Dynasty (1157-539 BC).

rinap: Presents fully searchable, annotated editions of the royal inscriptions of Neo-Assyrian kings Tiglath-pileser III (744-727 BC), Shalmaneser V (726-722 BC), Sennacherib (704-681 BC), Esarhaddon (680-669 BC), Ashurbanipal (668-631 BC), Aลกลกur-etel-ilฤni (630-627 BC), and Sรฎn-ลกarra-iลกkun (626-612 BC).

saao: The online counterpart to the State Archives of Assyria series.

suhu: This project presents annotated editions of the officially commissioned texts of the extant, first-millennium-BC inscriptions of the rulers of Suhu, texts published in Frame, RIMB 2 pp. 275-331.

tei: Databases used for full translation.

Authors

  • Gai Gutherz

  • Ariel Elazary

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

akkadian-1.0.3.tar.gz (33.8 kB view details)

Uploaded Source

Built Distribution

akkadian-1.0.3-py3-none-any.whl (33.1 MB view details)

Uploaded Python 3

File details

Details for the file akkadian-1.0.3.tar.gz.

File metadata

  • Download URL: akkadian-1.0.3.tar.gz
  • Upload date:
  • Size: 33.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.1

File hashes

Hashes for akkadian-1.0.3.tar.gz
Algorithm Hash digest
SHA256 5a49745c0d7ea4cbaa1a1d07dab24b38bc645eb204b80df8fffc5b34546d6f03
MD5 2a300131f27a99d37dda0f5f22361a17
BLAKE2b-256 8685d9362f41d27156916022ad31e05f9650cfd75c61d0da495c17ba9527cda1

See more details on using hashes here.

File details

Details for the file akkadian-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: akkadian-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 33.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.1

File hashes

Hashes for akkadian-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a6d5d08b52eedd263741d502a87f73ae104f3b689ff39405b7d046e7ac7e5a15
MD5 8bbd953f28512305e70e83c0a2cfce76
BLAKE2b-256 9ca78ea2a236fc6f4c65aee6d534d8ccbed1722612504d734cde845d9846d3cb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page