EIS1600 project tools and utilities

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 1 - Planning
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

EIS1600 Tools

Workflow
Process
Installation
Set Up
Working Directory Structure
Usage

Workflow

(so that we do not forget again...)

Double-check text in the Google Spreadsheet; “tag” is as “double-checked” (Column PREPARED);

These double-checked files have been converted to *.EIS1600 format

The names of these files are then collected into AUTOREPORT.md under DOUBLE-CHECKED Files (XX) - ready for MIU.
Running disassemble_into_mius takes the list from AUTOREPORT.md and disassembles these files into MIUs and stores them in the MIU repo.

Process

Convert from mARkdown to EIS1600TMP with convert_mARkdown_to_EIS1600
Check the .EIS1600TMP
Run insert_uids on the checked .EIS1600TMP
Check again. If anything was changed in the EIS1600 file, run update_uids
After double-check, the file can be disassembled by disassemble_into_miu_files <uri_of_that_file>.EIS1600

Installation

After creating and activating the eis16000_env (see Set Up), use:

$ pip install eis1600

In case you have an older version installed, use:

$ pip install --upgrade eis1600

The package comes with different options, to install camel-tools use. Check also their installation instructions because atm they require additional packages https://camel-tools.readthedocs.io/en/latest/getting_started.html#installation

$ pip install eis1600[NER]

If you want to run the annotation pipeline, you also need to download camel-tools data:

$ camel_data -i all

Note. You can use pip freeze to check the versions of all installed packages, including eis1600.

Set Up Virtual Environment and Install the EIS1600 PKG there

To not mess with other python installations, we recommend installing the package in a virual environment. To create a new virtual environment with python, run:

python3 -m venv eis1600_env

NB: while creating your new virtual environment, you must use Python 3.7 or 3.8, as these are version required by CAMeL-Tools.

After creation of the environment it can be activated by:

source eis1600_env/bin/activate

The environment is now activated and the eis1600 package can be installed into that environment with pip:

$ pip install eis1600

This command installs all dependencies as well, so you should see lots of other libraries being installed. If you do not, you must have used a wrong version of Python while creating your virtual environment.

You can now use the commands listed in this README.

To use the environment, you have to activate it for every session, by:

source eis1600_env/bin/activate

After successful activation, your user has the pre-text (eis1600_env).

Probably, you want to create an alias for the source command in your alias file by adding the following line:

alias eis="source eis1600_env/bin/activate"

Alias files:

on Linux:
- ~.bash_aliases
On Mac:
- .zshrc if you use zsh (default in the latest versions Mac OS);

Structure of the working directory

The working directory is always the main EIS1600 directory which is a parent to all the different repositories. The EIS1600 directory has the following structure:

|
|---| eis_env
|---| EIS1600_MIUs
|---| EIS1600_Pretrained_Models (optional)
|---| gazetteers
|---| Master_Chronicle
|---| OpenITI_EIS1600_Texts
|---| Training_Data

Path variables are in the module eis1600/helper/repo.

Usage

Convert mARkdown to EIS1600 files

Converts mARkdown file to EIS1600TMP (without inserting UIDs). The .EIS1600TMP file will be created next to the .mARkdown file (you can insert .inProcess or .completed files as well). This command can be run from anywhere within the text repo - use auto complete (tab) to get the correct path to the file. Alternative: open command line from the folder which contains the file which shall be converted.

$ convert_mARkdown_to_EIS1600TMP <uri>.mARkdown

EIS1600TMP files do not contain UIDs yet, to insert UIDs run insert_uids on the .EIS1600TMP file. This command can be run from anywhere within the text repo - use auto complete (tab) to get the correct path to the file.

$ insert_uids <uri>.EIS1600TMP

Batch processing of mARkdown files

Use the -e option to process all files from the EIS1600 repo.

$ convert_mARkdown_to_EIS1600 -e <EIS1600_repo>
$ insert_uids -e <EIS1600_repo>

To process all mARkdown files in a directory, give an input AND an output directory. Resulting .EIS1600TMP files are stored in the output directory.

$ convert_mARkdown_to_EIS1600 <input_dir> <output_dir>
$ insert_uids <input_dir> <output_dir>

Disassembling

Disassemble files into individual MIU files. Run from the parent directory EIS1600, this will disassemble all files from the AUTOREPORT.

$ disassemble_into_miu_files

Can also be run from anywhere within the EIS1600_MIUs/ directory with a single files as input. E.G.:

$ disassemble_into_miu_files <uri_of_the_text>.EIS1600

Reassembling

Run inside MIU repo. Reassemble files into the TEXT repo, therefore, TEXT repo has to be next to MIU repo.

$ reassemble_from_miu_files <uri>.IDs

Use the -e option to process all files from the MIU repo. Must be run from the root of MIU repo.

$ reassemble_from_miu_files -e <MIU_repo>

Annotation

NER annotation for persons, toponyms, misc, and also dates, beginning and ending of onomastic information (NASAB), and onomastic information.

Note Can only be run if package was installed with NER flag AND if the ML models are in the EIS1600_Pretrained_Models directory.

If no input is given, annotation is run for the whole repository. Can be used with -p option for parallelization. Run from the parent directory EIS1600 (internally used path starts with: EIS1600_MIUs/).

$ annotate_mius -p

To annotate all MIU files of a text give the IDs file as argument. Can be used with -p option to run in parallel.

$ annotate_mius <uri>.IDs

To annotate an individual MIU file, give MIU file as argument.

$ annotate_mius <uri>/MIUs/<uri>.<UID>.EIS1600

Only Onomastic Annotation

Only for test purposes! Can be run with -D to process one file at a time, otherwise runs in parallel. Can be run with -T to use gold-standard data as input. Run from the parent directory EIS1600.

$ onomastic_annotation

MIU revision

Run the following command from the root of the MIU repo to revise automated annotated files:

$ miu_random_revisions

When first run, the file file_picker.yml is added to the root of the MIU repository. Make sure to specify your operating system and to set your initials and the path/command to/for Kate in this YAML file.

system: ... # options: mac, lin, win;
reviewer: eis1600researcher # change this to your name;
path_to_kate: kate # add absolute path to Kate on your machine; or a working alias (kate should already work)

Optional, you can specify a path from where to open files - e.g. if you only want to open training-data, set:

miu_main_path: ./training_data/

When revising files, remember to change

reviewed    : NOT REVIEWED

reviewed    : REVIEWED

Collect YAMLHeaders into JSON

Run from the parent directory EIS1600:

$ yml_to_json

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 1 - Planning
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.6.9

Aug 9, 2024

1.6.8

May 14, 2024

1.6.7

May 8, 2024

1.6.6

May 6, 2024

1.6.5

May 3, 2024

1.6.4

May 3, 2024

1.6.3

Apr 29, 2024

1.6.2

Apr 4, 2024

1.6.1

Mar 22, 2024

1.6.0

Mar 16, 2024

1.5.9

Mar 12, 2024

1.5.8

Mar 12, 2024

1.5.7

Mar 4, 2024

1.5.6

Mar 1, 2024

1.5.5

Feb 29, 2024

1.5.3

Feb 29, 2024

1.5.2

Feb 29, 2024

1.5.1

Feb 29, 2024

1.5.0

Feb 29, 2024

1.4.9

Feb 29, 2024

1.4.8

Feb 27, 2024

1.4.7

Feb 26, 2024

1.4.6

Feb 22, 2024

1.4.5

Feb 19, 2024

1.4.4

Feb 16, 2024

1.4.3

Feb 16, 2024

1.4.1

Jan 24, 2024

1.4.0

Jan 23, 2024

1.3.9

Jan 22, 2024

1.3.8

Jan 22, 2024

1.3.7

Jan 22, 2024

1.3.6

Jan 22, 2024

1.3.5

Jan 19, 2024

1.3.4

Jan 19, 2024

1.3.3

Jan 19, 2024

1.3.2

Jan 19, 2024

1.3.1

Jan 19, 2024

1.3.0

Jan 18, 2024

1.2.8

Jan 15, 2024

1.2.6

Jan 12, 2024

1.2.5

Jan 12, 2024

1.2.4

Jan 12, 2024

1.2.3

Jan 12, 2024

1.2.2

Jan 12, 2024

1.2.1

Jan 10, 2024

1.2.0

Jan 10, 2024

1.1.9

Jan 2, 2024

1.1.7

Dec 22, 2023

1.1.6

Dec 21, 2023

1.1.5

Dec 21, 2023

1.1.4

Dec 4, 2023

1.1.2

Nov 15, 2023

1.1.1

Nov 10, 2023

1.1.0

Nov 9, 2023

1.0.9

Nov 6, 2023

1.0.8

Nov 6, 2023

This version

1.0.7

Nov 6, 2023

1.0.6

Oct 11, 2023

1.0.5

Oct 6, 2023

1.0.4

Aug 22, 2023

1.0.2

Aug 4, 2023

1.0.1

Aug 4, 2023

0.9.7

Jul 12, 2023

0.9.5

Jun 21, 2023

0.9.4

Jun 8, 2023

0.9.3

Jun 5, 2023

0.9.2

May 24, 2023

0.9.1

May 17, 2023

0.9.0

May 11, 2023

0.8.9

May 4, 2023

0.8.7

Apr 28, 2023

0.8.6

Apr 27, 2023

0.8.5

Apr 20, 2023

0.8.4

Apr 19, 2023

0.8.3

Apr 19, 2023

0.8.2

Apr 17, 2023

0.8.1

Apr 17, 2023

0.8.0

Mar 31, 2023

0.7.7

Feb 23, 2023

0.7.6

Feb 17, 2023

0.7.5

Feb 6, 2023

0.7.3

Jan 18, 2023

0.7.2

Jan 9, 2023

0.7.1

Jan 6, 2023

0.7.0

Dec 16, 2022

0.6.9

Dec 16, 2022

0.6.8

Dec 16, 2022

0.6.7

Dec 16, 2022

0.6.6

Dec 16, 2022

0.6.5

Dec 16, 2022

0.6.4

Dec 15, 2022

0.6.3

Dec 15, 2022

0.6.2

Dec 14, 2022

0.6.1

Dec 14, 2022

0.6.0

Dec 14, 2022

0.5.9

Dec 14, 2022

0.5.8

Dec 14, 2022

0.5.7

Dec 14, 2022

0.5.6

Dec 14, 2022

0.5.5

Dec 14, 2022

0.5.4

Dec 2, 2022

0.5.3

Dec 1, 2022

0.5.2

Dec 1, 2022

0.5.1

Dec 1, 2022

0.5.0

Dec 1, 2022

0.4.9

Nov 30, 2022

0.4.8

Nov 29, 2022

0.4.7

Nov 29, 2022

0.4.6

Nov 28, 2022

0.4.5

Nov 25, 2022

0.4.4

Nov 24, 2022

0.4.3

Nov 17, 2022

0.4.2

Nov 17, 2022

0.4.1

Nov 10, 2022

0.4.0

Nov 4, 2022

0.3.9

Nov 2, 2022

0.3.8

Nov 1, 2022

0.3.7

Oct 27, 2022

0.3.6

Oct 27, 2022

0.3.5

Oct 26, 2022

0.3.4

Oct 26, 2022

0.3.3

Oct 20, 2022

0.3.2

Oct 19, 2022

0.3.1

Oct 17, 2022

0.3.0

Oct 17, 2022

0.2.9

Oct 14, 2022

0.2.8

Oct 13, 2022

0.2.7

Oct 13, 2022

0.2.6

Oct 7, 2022

0.2.5

Oct 6, 2022

0.2.4

Oct 5, 2022

0.2.3

Oct 5, 2022

0.2.2

Sep 30, 2022

0.2.1

Sep 30, 2022

0.2.0

Sep 29, 2022

0.1.0

Sep 23, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eis1600-1.0.7.tar.gz (90.1 kB view hashes)

Uploaded Nov 6, 2023 Source

Built Distribution

eis1600-1.0.7-py3-none-any.whl (444.3 kB view hashes)

Uploaded Nov 6, 2023 Python 3

Hashes for eis1600-1.0.7.tar.gz

Hashes for eis1600-1.0.7.tar.gz
Algorithm	Hash digest
SHA256	`5a9a995974f688b23938655dab307ae1b7dfb9d6f1284160374e1420856d6431`
MD5	`4b7e929877dd7c6798b728ca0a5cdad9`
BLAKE2b-256	`03f75aa56b7feca3c6f90a6e82af643dc635a439c383434a4d4a312c0baec87b`

Hashes for eis1600-1.0.7-py3-none-any.whl

Hashes for eis1600-1.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8d09522aea8e9184965c6a55e07c1d58709dab59da8f63f66dd5d6bbec0e698a`
MD5	`397ae0517e26577a09cbc72f4bb628be`
BLAKE2b-256	`5b04034619352c90e2155813b95a6cba38d994552472801aa2d3b269bea4ea12`