A package collecting various functions to work with ancient Mediterranean datasets (textual, spatial, etc.)

These details have not been verified by PyPI

Project links

Homepage

License
- CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

anda

[toc]

pip install anda

This is a Python package for collecting, manipulation and visualizing various ancient Mediterranean data. It focus on their temporal, textual and spatial aspects. It is structured into several gradually evolving submodules, namely gr, imda, concs, and textnet.

anda.gr

from anda import gr

This module is dedicated to preprocessing of ancient Greek textual data. It contains functions for lemmatization, posttagging and translation. It relies heavely on Morhesus Dictionary.

Lemmatization

A minimal usage is to lemmatize individual word. You can either ask for only the first lemma (return_first_lemma()) or for all possibilities (return_all_unique_lemmata(). In most cases , the outcome is the same:

gr.return_first_lemma("ἐπιστήμην")
> 'ἐπιστήμη'

gr.return_all_unique_lemmata("ἐπιστήμην")
> 'ἐπιστήμη'

Above these are functions lemmatize_string() and gr.get_lemmatized_sentences(). Both work with string of any length. The first returns a list of lemmata. The second returns a list of lemmatized sentences.

string = "Πρότασις μὲν οὖν ἐστὶ λόγος καταφατικὸς ἢ ἀποφατικὸς τινὸς κατά τινος. Οὗτος δὲ ἢ καθόλου ἢ ἐν μέρει ἢ ἀδιόριστος. Λέγω δὲ καθόλου μὲν τὸ παντὶ ἢ μηδενὶ ὑπάρχειν, ἐν μέρει δὲ τὸ τινὶ ἢ μὴ τινὶ ἢ μὴ παντὶ ὑπάρχειν, ἀδιόριστον δὲ τὸ ὑπάρχειν ἢ μὴ ὑπάρχειν ἄνευ τοῦ καθόλου, ἢ κατὰ μέρος, οἷον τὸ τῶν ἐναντίων εἶναι τὴν αὐτὴν ἐπιστήμην ἢ τὸ τὴν ἡδονὴν μὴ εἶναι ἀγαθόν."

gr.lemmatize_string(string)
> ['πρότασις', 'λόγος', 'καταφατικός', 'ἀποφατικός', 'καθόλου', 'μέρος', 'ἀδιόριστος', 'λέγω', 'καθόλου', 'πᾶς', 'μηδείς', 'ὑπάρχω', 'μέρος', 'πᾶς', 'ὑπάρχω', 'ἀδιόριστον', 'ὑπάρχω', 'ὑπάρχω', 'ἄνευ', 'καθόλου', 'μέρος', 'οἷος', 'ἐναντίος', 'αὐτην', 'ἐπιστήμη', 'ἡδονην', 'ἀγαθός']

gr.get_lemmatized_sentences(string)
> [['πρότασις', 'λόγος', 'καταφατικός', 'ἀποφατικός'], ['καθόλου', 'μέρος', 'ἀδιόριστος'], ['λέγω', 'καθόλου', 'πᾶς', 'μηδείς', 'ὑπάρχω', 'μέρος', 'πᾶς', 'ὑπάρχω', 'ἀδιόριστον', 'ὑπάρχω', 'ὑπάρχω', 'ἄνευ', 'καθόλου', 'μέρος', 'οἷος', 'ἐναντίος', 'αὐτην', 'ἐπιστήμη', 'ἡδονην', 'ἀγαθός']]

All lemmatization functions can be further parametrized by several arguments

all_lemmata=False :
filter_by_postag=["n","a","v"]: returns only nouns ("n"), adjectives ("a") and verbs ("v")
involve_unknown=True, if False, it returns only words found in the dictionary

Thus, you can run:

lemmatized_sentences = gr.get_lemmatized_sentences(string, all_lemmata=False, filter_by_postag=["n","a","v"], involve_unknown=False)
print(lemmatized_sentences)
> [['λόγος'], ['μέρος'], ['πᾶς', 'μηδείς', 'ὑπάρχω', 'μέρος', 'πᾶς', 'ὑπάρχω', 'ὑπάρχω', 'ὑπάρχω', 'ἄνω/ἀνίημι', 'μέρος', 'οἷος', 'ἐναντίος', 'ἐπιστήμη', 'ἀγαθός']]

(1) get_lemmatized_sentences(string, all_lemmata=False, filter_by_postag=None, involve_unknown=False): it receives a raw Greek text of any kind and extent as its input Such input is processed by a series of subsequent functions embedded within each other, which might be also used independently

(1) get_sentences() splits the string into sentences by common sentence separators.

(2) lemmatize_string(sentence) first calls tokenize_string(), which makes a basic cleaning and stopwords filtering for the sentence, and returns a list of words. Subsequently, each word from the tokenized sentence is sent either to return_first_lemma() or to return_all_unique_lemmata(), on the basis of the value of the parameter all_lemmata= (set to False by default).

(4) return_all_unique_lemmata()goes to the morpheus_dict values and returns all unique lemmata.

(5) Parameter filter_by_postag= (default None) enables to sub-select chosen word types from the tokens, on the basis of first character in the tag "p" . Thus, to choose only nouns, adjectives, and verbs, you can set filter_by_postag=["n", "a", "v"]. PREFERENCE: If verb, noun, and adjective variants are available, only then noun and adjective form is returned. If both noun and adjective is available, only noun is returned.

Translation

Next to the lemmatization, there is also a series of functions for translations, like return_all_unique_translations(word, filter_by_postag=None, involve_unknown=False), useful for any wordform, and lemma_translator(word), where we already have a lemma.

gr.return_all_unique_translations("ὑπάρχειν", filter_by_postag=None, involve_unknown=False)
> 'to begin, make a beginning'

gr.lemma_translator("λόγος")
> 'the word'

Morphological analysis

You can also do a morphological analysis of a string

gr.morphological_analysis(string)[1:4]
> [{'i': '564347',
  'f': 'μέν',
  'b': 'μεν',
  'l': 'μέν',
  'e': 'μεν',
  'p': 'g--------',
  'd': '20753',
  's': 'on the one hand, on the other hand',
  'a': None},
 {'i': '642363',
  'f': 'οὖν',
  'b': 'ουν',
  'l': 'οὖν',
  'e': 'ουν',
  'p': 'g--------',
  'd': '23870',
  's': 'really, at all events',
  'a': None},
 {'i': '264221',
  'f': 'ἐστί',
  'b': 'εστι',
  'l': 'εἰμί',
  'e': 'ειμι',
  'p': 'v3spia---',
  'd': '9722',
  's': 'I have',
  'a': None}]

imda

This module will serve for importing various ancient Mediterranean resources. Most of them will be imported directly from open third-party online resources. However, some of them have been preprocessed as part of the SDAM project.

The ideal is that it will work like this:

imda.list_datasets()
>>> ['roman_provinces_117', 'EDH', 'roman_cities_hanson', 'orbis_network']

And:

rp = imda.import_dataset("roman_provinces_117", "gdf")
type(rp)
>>>geopandas.geodataframe

concs

This module contains functions for working

textnet

This module contains functions for generating, analyzing and visualizing word co-occurrence networks. It has been designed especially for working with textual data in ancient Greek.

Versions history

0.0.8 - bugs removed
0.0.7 - filter_by_postag with preference of nouns and adjectives by default
0.0.6 - greek dictionaries included within the package
0.0.5 - experimenting with data inclusion
0.0.4 - docs

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.8

May 21, 2020

0.0.6

May 7, 2020

0.0.5

May 7, 2020

0.0.4

May 5, 2020

0.0.3

May 5, 2020

0.0.2

Jan 31, 2020

0.0.1

Jan 31, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anda-0.0.8.tar.gz (19.0 MB view details)

Uploaded May 21, 2020 Source

Built Distribution

anda-0.0.8-py3-none-any.whl (21.3 MB view details)

Uploaded May 21, 2020 Python 3

File details

Details for the file anda-0.0.8.tar.gz.

File metadata

Download URL: anda-0.0.8.tar.gz
Upload date: May 21, 2020
Size: 19.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.6.8

File hashes

Hashes for anda-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`7c716cef26ea7018d14f2fd3f185d37e5e0bd8c4446bc769cdb28b45d5c765f0`
MD5	`b1b59977327ade20f01e13da2299cf16`
BLAKE2b-256	`8dfbf8180c60ef32d11bd64193b39fcadda05993151974d715b0e7b644e66761`

See more details on using hashes here.

File details

Details for the file anda-0.0.8-py3-none-any.whl.

File metadata

Download URL: anda-0.0.8-py3-none-any.whl
Upload date: May 21, 2020
Size: 21.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.6.8

File hashes

Hashes for anda-0.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ca95612596380bee3005d6fcb012c932ad8dc51238e63bf73f7ca805662724b3`
MD5	`2d1dcffc8162cb38686db1933cdc25c3`
BLAKE2b-256	`a452a0bf69e5d1f865b505ac42a4701ea0fbdf2c7f97b0c11acf60d40f286760`

See more details on using hashes here.

anda 0.0.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

anda

anda.gr

Lemmatization

Translation

Morphological analysis

imda

concs

textnet

Versions history

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes