Skip to main content

Translating Akkadian signs to transliteration using NLP algorithms

Project description

Translating-Akkadian-using-NLP

Translating Akkadian signs to transliteration using NLP algorithms such as HMM, MEMM and BiLSTM neural networks.

Getting Started

There are 3 main ways to deploy the project:

  • Website
  • Python package
  • Github clone

Website

Use this link to access the website: https://babylonian.herokuapp.com/#/

Go to "Akkademia" tab and enter signs to see them transliterated.

Python Package

These instructions will enable you to use the project on your local machine for transliterating using "akkadian" python package that is based on our project.

Prerequisites

Install Python 3.6 or 3.7 - Link for example (version 3.7.1): https://www.python.org/downloads/release/python-371/.

Installing

Install akkadian package (may takes a while). One way to do so is using pip:

pip install akkadian

Running

Following are a few examples for running sessions.

Transliterating akkadian signs:

import akkadian.transliterate as akk
print(akk.transliterate("๐’น๐’€ญ๐’Œ๐’‹€๐’ˆจ๐’Œ๐’Œท๐’€"))

Transliterating akkadian signs using BiLSTM:

import akkadian.transliterate as akk
print(akk.transliterate_bilstm("๐’น๐’€ญ๐’Œ๐’‹€๐’ˆจ๐’Œ๐’Œท๐’€"))

Top three options of transliterating akkadian signs using BiLSTM:

import akkadian.transliterate as akk
print(akk.transliterate_bilstm_top3("๐’น๐’€ญ๐’Œ๐’‹€๐’ˆจ๐’Œ๐’Œท๐’€"))

Transliterating akkadian signs using MEMM:

import akkadian.transliterate as akk
print(akk.transliterate_memm("๐’น๐’€ญ๐’Œ๐’‹€๐’ˆจ๐’Œ๐’Œท๐’€"))

Transliterating akkadian signs using HMM:

import akkadian.transliterate as akk
print(akk.transliterate_hmm("๐’น๐’€ญ๐’Œ๐’‹€๐’ˆจ๐’Œ๐’Œท๐’€"))

Github

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Install Python 3.6 or 3.7 - Link for example (version 3.7.1): https://www.python.org/downloads/release/python-371/.

If you don't have git installed, install git - https://git-scm.com/downloads (Choose the appropriate operating system).

If you don't have a Github user, create one - https://github.com/join?source=header-home.

Installing the python dependencies

Install torch: Windows -

pip install torch===1.3.1 torchvision===0.4.2 -f https://download.pytorch.org/whl/torch_stable.html

Linux and MAC -

pip install torch torchvision

Install allennlp:

pip install allennlp==0.8.5

Cloning the project

Clone the project:

git clone https://github.com/gaigutherz/Translating-Akkadian-using-NLP.git

Running

Now you can develop for the Translating-Akkadian-using-NLP repository and and your improvements!

Training

Use the file train.py in order to train the models using the datasets. There is a function for each model that trains, stores the pickle and tests its performance on a specific corpora.

The functions are as follows:

hmm_train_and_test(corpora)
memm_train_and_test(corpora)
biLSTM_train_and_test(corpora)

Transliterating

Use the file transliterate.py in order to transliterate using the models. There is a function for each model that gets a sentence of Akkadian signs as parameter and returns its transliteration.

Example of usage:

akkadian_signs = "๐’น๐’€ญ๐’Œ๐’‹€๐’ˆจ๐’Œ๐’Œท๐’€"
print(transliterate(akkadian_signs))
print(transliterate_bilstm(akkadian_signs))
print(transliterate_bilstm_top3(akkadian_signs))
print(transliterate_hmm(akkadian_signs))
print(transliterate_memm(akkadian_signs))

Datasets

The main datasets used for training and tests are:

Dataset King Time Line Number Percentage of Corpora
RINAP 1 Tiglath-pileser III and Shalmaneser V 744-722 BC 1125 4.78%
RINAP 3 Sennacherib 704-681 BC 7131 30.31%
RINAP 4 Esarhaddon 680-669 BC 6018 25.58%
RINAP 5 Ashurbanipal and Successors 668-612 BC 9252 39.33%

More datasets used:

  • RIAO - This project intends to present annotated editions of the entire corpus of Assyrian royal inscriptions, texts that were published in RIMA 1-3.

  • RIBO - This project intends to present annotated editions of the entire corpus of Babylonian royal inscriptions from the Second Dynasty of Isin to the Neo-Babylonian Dynasty (1157-539 BC).

  • SAAO - The online counterpart to the State Archives of Assyria series.

  • SUHU - This project presents annotated editions of the officially commissioned texts of the extant, first-millennium-BC inscriptions of the rulers of Suhu, texts published in Frame, RIMB 2 pp. 275-331.

  • TEI - Databases used for full translation.

Datasets deployment

The datasets are taken from ORACC project and can be downloaded from the following link: http://oracc.museum.upenn.edu/rinap/rinapdownloads/index.html.

In our repository the datasets are located in the "raw_data" directory. They can be also downloaded from the Github repository using git clone or zip download.

Project structure

BiLSTM_input:

Contains  dictionaries used for transliteration by BiLSTM.

NMT_input:

Contains dictionaries used for natural machine translation.

akkadian.egg-info:

Inforamtion  and settings for akkadian python package.

akkadian:

Sources and train's output.

output:	Train's output for HMM, MEMM and BiLSTM - mostly pickles.

__init__.py: Init script for akkadian python package. Initializes global variables.

bilstm.py:  Class for BiLSTM train and prediction using AllenNLP implementation.

build_data.py: Code for organizing the data in dictionaries.

check_translation.py: Code for translation accuracy checking.

combine_algorithms.py: Code for prediction using both HMM, MEMM and BiLSTM.

data.py: Utils for accuracy checks and dictionaries interpretations.

full_translation_build_data.py: Code for organizing the data for full translation task.

get_texts_details.py: Util for getting more information about the text.

hmm.py: Implementation of HMM for train and prediction.

memm.py: Implementation of MEMM for train and prediction.

parse_json: Json parsing used for data organizing.

parse_xml.py: XML parsing used for data organizing.

train.py: API for training all 3 algorithms and store the output.

translation_tokenize.py: Code for tokenization for translation task.

transliterate.py: API for transliterating using all 3 algorithms.

build/lib/akkadian:

Inforamtion  and settings for akkadian python package.

dist:

Akkadian python package - wheel and tar.

raw_data:

Databases used for  training the models.

random: 4 Texts used for cross era testing.

riao: This project intends to present annotated editions of the entire corpus of Assyrian royal inscriptions, texts that were published in RIMA 1-3.

ribo: This project intends to present annotated editions of the entire corpus of Babylonian royal inscriptions from the Second Dynasty of Isin to the Neo-Babylonian Dynasty (1157-539 BC).

rinap: Presents fully searchable, annotated editions of the royal inscriptions of Neo-Assyrian kings Tiglath-pileser III (744-727 BC), Shalmaneser V (726-722 BC), Sennacherib (704-681 BC), Esarhaddon (680-669 BC), Ashurbanipal (668-631 BC), Aลกลกur-etel-ilฤni (630-627 BC), and Sรฎn-ลกarra-iลกkun (626-612 BC).

saao: The online counterpart to the State Archives of Assyria series.

suhu: This project presents annotated editions of the officially commissioned texts of the extant, first-millennium-BC inscriptions of the rulers of Suhu, texts published in Frame, RIMB 2 pp. 275-331.

tei: Databases used for full translation.

Authors

  • Gai Gutherz

  • Ariel Elazary

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

akkadian-1.0.6.tar.gz (33.9 kB view details)

Uploaded Source

Built Distribution

akkadian-1.0.6-py3-none-any.whl (101.0 MB view details)

Uploaded Python 3

File details

Details for the file akkadian-1.0.6.tar.gz.

File metadata

  • Download URL: akkadian-1.0.6.tar.gz
  • Upload date:
  • Size: 33.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.1

File hashes

Hashes for akkadian-1.0.6.tar.gz
Algorithm Hash digest
SHA256 0daf1cdad943cba8460643aa1111c5687f1b4d96224536e20568e707863c8d0c
MD5 2df94765044adf83fdf375f373aad1ff
BLAKE2b-256 7fdb7e689d615a09e1199cda1d99fdc9a131771f1ee8659141f61723cce316ca

See more details on using hashes here.

File details

Details for the file akkadian-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: akkadian-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 101.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.1

File hashes

Hashes for akkadian-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 3a904fec7ae135636a98df0715db338c4b81c9897bf4aac812a95efc099f65d7
MD5 5b7ca5db129efdaec57f694300924253
BLAKE2b-256 27186397179b2d4a55c368f4b84d75d3919554fb1164e6961895ca2c77257f95

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page