Translating Akkadian signs to transliteration using NLP algorithms
Project description
Translating-Akkadian-using-NLP
Translating Akkadian signs to transliteration using NLP algorithms such as HMM, MEMM and BiLSTM neural networks.
Getting Started
There are 3 main ways to deploy the project:
- Website
- Python package
- Github clone
Website
Use this link to access the website: https://babylonian.herokuapp.com/#/
Go to "Translit" tab and enter signs to see them transliterated.
Python Package
These instructions will enable you to use the project on your local machine for transliterating using "akkadian" python package that is based on our project.
Prerequisites
Install Python 3.6 or 3.7 - Link for example (version 3.7.1): https://www.python.org/downloads/release/python-371/.
Installing
Install akkadian package. One way to do so is using pip:
pip install akkadian
Running
Following are a few examples for running sessions.
Tranliterating akkadian signs:
import akkadian.transliterate as akk
print(akk.transliterate("๐น๐ญ๐๐๐จ๐๐ท๐"))
Tranliterating akkadian signs using BiLSTM:
import akkadian.transliterate as akk
print(akk.transliterate_bilstm("๐น๐ญ๐๐๐จ๐๐ท๐"))
Top three options of tranliterating akkadian signs using BiLSTM:
import akkadian.transliterate as akk
print(akk.transliterate_bilstm_top3("๐น๐ญ๐๐๐จ๐๐ท๐"))
Tranliterating akkadian signs using MEMM:
import akkadian.transliterate as akk
print(akk.transliterate_memm("๐น๐ญ๐๐๐จ๐๐ท๐"))
Tranliterating akkadian signs using HMM:
import akkadian.transliterate as akk
print(akk.transliterate_hmm("๐น๐ญ๐๐๐จ๐๐ท๐"))
Github
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Prerequisites
Install Python 3.6 or 3.7 - Link for example (version 3.7.1): https://www.python.org/downloads/release/python-371/.
If you don't have git installed, install git - https://git-scm.com/downloads (Choose the appropriate operating system).
If you don't have a Github user, create one - https://github.com/join?source=header-home.
Installing the python dependencies
Install torch: Windows -
pip install torch===1.3.1 torchvision===0.4.2 -f https://download.pytorch.org/whl/torch_stable.html
Linux and MAC -
pip install torch torchvision
Install allennlp:
pip install allennlp==0.8.5
Cloning the project
Clone the project:
git clone https://github.com/gaigutherz/Translating-Akkadian-using-NLP.git
Running
Now you can develop for the Translating-Akkadian-using-NLP repository and and your improvements!
Training
Use the file train.py in order to train the models using the datasets. There is a function for each model that trains, stores the pickle and tests its performance on a specific corpora.
The functions are as follows:
hmm_train_and_test(corpora)
memm_train_and_test(corpora)
biLSTM_train_and_test(corpora)
Transliterating
Use the file transliterate.py in order to transliterate using the models. There is a function for each model that gets a sentence of Akkadian signs as parameter and returns its transliteration.
Example of usage:
akkadian_signs = "๐น๐ญ๐๐๐จ๐๐ท๐"
print(transliterate(akkadian_signs))
print(transliterate_bilstm(akkadian_signs))
print(transliterate_bilstm_top3(akkadian_signs))
print(transliterate_hmm(akkadian_signs))
print(transliterate_memm(akkadian_signs))
Datasets
The main datasets used for training and tests are:
Dataset | King | Time | Line Number | Percentage of Corpora |
---|---|---|---|---|
RINAP 1 | Tiglath-pileser III and Shalmaneser V | 744-722 BC | 1125 | 4.78% |
RINAP 3 | Sennacherib | 704-681 BC | 7131 | 30.31% |
RINAP 4 | Esarhaddon | 680-669 BC | 6018 | 25.58% |
RINAP 5 | Ashurbanipal and Successors | 668-612 BC | 9252 | 39.33% |
More datasets used:
-
RIAO - This project intends to present annotated editions of the entire corpus of Assyrian royal inscriptions, texts that were published in RIMA 1-3.
-
RIBO - This project intends to present annotated editions of the entire corpus of Babylonian royal inscriptions from the Second Dynasty of Isin to the Neo-Babylonian Dynasty (1157-539 BC).
-
SAAO - The online counterpart to the State Archives of Assyria series.
-
SUHU - This project presents annotated editions of the officially commissioned texts of the extant, first-millennium-BC inscriptions of the rulers of Suhu, texts published in Frame, RIMB 2 pp. 275-331.
-
TEI - Databases used for full translation.
Datasets deployment
The datasets are taken from ORACC project and can be downloaded from the following link: http://oracc.museum.upenn.edu/rinap/rinapdownloads/index.html.
In our repository the datasets are located in the "raw_data" directory. They can be also downloaded from the Github repository using git clone or zip download.
Project structure
BiLSTM_input:
Contains dictionaries used for transliteration by BiLSTM.
NMT_input:
Contains dictionaries used for natural machine translation.
akkadian.egg-info:
Inforamtion and settings for akkadian python package.
akkadian:
Sources and train's output.
output: Train's output for HMM, MEMM and BiLSTM - mostly pickles.
__init__.py: Init script for akkadian python package. Initializes global variables.
bilstm.py: Class for BiLSTM train and prediction using AllenNLP implementation.
build_data.py: Code for organizing the data in dictionaries.
check_translation.py: Code for translation accuracy checking.
combine_algorithms.py: Code for prediction using both HMM, MEMM and BiLSTM.
data.py: Utils for accuracy checks and dictionaries interpretations.
full_translation_build_data.py: Code for organizing the data for full translation task.
get_texts_details.py: Util for getting more information about the text.
hmm.py: Implementation of HMM for train and prediction.
memm.py: Implementation of MEMM for train and prediction.
parse_json: Json parsing used for data organizing.
parse_xml.py: XML parsing used for data organizing.
train.py: API for training all 3 algorithms and store the output.
translation_tokenize.py: Code for tokenization for translation task.
transliterate.py: API for transliterating using all 3 algorithms.
build/lib/akkadian:
Inforamtion and settings for akkadian python package.
dist:
Akkadian python package - wheel and tar.
raw_data:
Databases used for training the models.
random: 4 Texts used for cross era testing.
riao: This project intends to present annotated editions of the entire corpus of Assyrian royal inscriptions, texts that were published in RIMA 1-3.
ribo: This project intends to present annotated editions of the entire corpus of Babylonian royal inscriptions from the Second Dynasty of Isin to the Neo-Babylonian Dynasty (1157-539 BC).
rinap: Presents fully searchable, annotated editions of the royal inscriptions of Neo-Assyrian kings Tiglath-pileser III (744-727 BC), Shalmaneser V (726-722 BC), Sennacherib (704-681 BC), Esarhaddon (680-669 BC), Ashurbanipal (668-631 BC), Aลกลกur-etel-ilฤni (630-627 BC), and Sรฎn-ลกarra-iลกkun (626-612 BC).
saao: The online counterpart to the State Archives of Assyria series.
suhu: This project presents annotated editions of the officially commissioned texts of the extant, first-millennium-BC inscriptions of the rulers of Suhu, texts published in Frame, RIMB 2 pp. 275-331.
tei: Databases used for full translation.
Authors
-
Gai Gutherz
-
Ariel Elazary
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file akkadian-1.0.4.tar.gz
.
File metadata
- Download URL: akkadian-1.0.4.tar.gz
- Upload date:
- Size: 33.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d76938548ad4f8f7edb9bfd4397231751696687b81cfefc198ada495f2675779 |
|
MD5 | 7b062d1cdaefbdb299d2b39cba6e037c |
|
BLAKE2b-256 | cbfb6d52e0e86eb82ac26c482c9e942ede67ec9b402697eb5c3bbabcd2e332df |
File details
Details for the file akkadian-1.0.4-py3-none-any.whl
.
File metadata
- Download URL: akkadian-1.0.4-py3-none-any.whl
- Upload date:
- Size: 33.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a94511ae1bc071af2dae2620681c6db9ace231466b966a8704503037a6f271c |
|
MD5 | 243eaaa80a8a06797c5c42988093ca38 |
|
BLAKE2b-256 | efb7bc8ab55a5e6744e85da41190a9c8a2fbc4069e8c8cb86e7244788c2801ca |