A Python library for collating eScriptorium documents.

These details have not been verified by PyPI

Project description

eScriptorium Collate

A Python library for collating eScriptorium documents. This is a pre-release version in public alpha.

Table of Contents

eScriptorium Collate

Installation

Requirements

Python 3
Java Runtime Environment (< 15)

Vendored Binaries

This package uses the CollateX collatex-tools-1.7.1.jar Java Archive. The Jar file is bundled with the package, so there is no need to download it separately. However, you will need to ensure that your system has a working Java Runtime Environment version < 15 accessible under JAVA_HOME. Click here for more information about CollateX.

Virtual Environment

Before installing the package, it is a good idea to create a Python virtual environment:

pip install virtualenv
virtualenv -p python3 venv
source venv/bin/activate

Alternatively:

python3 -m venv venv
source venv/bin/activate

Click here for a more detailed guide to Python virtual environments.

Install

Once the virtual environment is activated, install the package:

pip install escriptorium-connector @ git+https://gitlab.com/oeshera/escriptorium_python_connector
pip install escriptorium-collate

[!NOTE]
This package depends on escriptorium-connector. However, the version of escriptorium-connector currently published on PyPi is not up to date with the latest development version of eScriptorium. Depending on the version of eScriptorium you are using, the PyPi version of escriptorium-connector may fail. As a temporary solution, the above-mentioned fork of escriptorium-connector can be used. It will work in most cases.

Quick Start

Instantiate the eScriptorium Connector

import os

from dotenv import load_dotenv
from escriptorium_connector import EscriptoriumConnector

load_dotenv(override=True)
url = str(os.getenv("ESCRIPTORIUM_URL"))
username = str(os.getenv("ESCRIPTORIUM_USERNAME"))
password = str(os.getenv("ESCRIPTORIUM_PASSWORD"))
api_key = os.getenv("ESCRIPTORIUM_API_KEY")

if api_key:
    escr = EscriptoriumConnector(url, api_key=str(api_key))
else:
    escr = EscriptoriumConnector(url, username, password)

The .env file should look like this:

ESCRIPTORIUM_URL=your_escriptorium_url
ESCRIPTORIUM_API_KEY=your_escriptorium_api_key
ESCRIPTORIUM_USERNAME=your_escriptorium_username
ESCRIPTORIUM_PASSWORD=your_escriptorium_password

You need only provide ESCRIPTORIUM_API_KEY or both ESCRIPTORIUM_USERNAME and ESCRIPTORIUM_PASSWORD.

Instantiate the Witness that will be Collated

from escriptorium_collate.collate import Witness

witnesses = [
    Witness(
        doc_pk=1,
        siglum="A",
        diplomatic_transcription_name="diplomatic",
        normalized_transcription_name="normalized",
    ),
    Witness(
        doc_pk=2,
        siglum="B",
        diplomatic_transcription_name="diplomatic",
        normalized_transcription_name="normalized",
    ),
    Witness(
        doc_pk=3,
        siglum="C",
        diplomatic_transcription_name="diplomatic",
        normalized_transcription_name="normalized",
    ),
]

Instantiate the Arguments to be Passed to CollateX

from escriptorium_collate.collate import CollatexArgs

collatex_args = CollatexArgs()

Return the CollateX Results as a Python Dictionary

from escriptorium_collate.collate import collate

collatex_output = collate(escr=escr, witnesses=witnesses, collatex_args=collatex_args)

API

This packaged contains two modules: escriptorium_collate/collate.py and escriptorium_collate/transcription_layers.py.

`escriptorium_collate.collate`

`escriptorium_collate.collate.Witness`

An interface for defining an eScriptorium document as a witness to be passed to CollateX.

class Witness(BaseModel):
  doc_pk: int # Primary key of an eScriptorium document (int)
  siglum: str # Arbitrary siglum to be used in the critical apparatus (str)
  diplomatic_transcription_pk: int | None
  diplomatic_transcription_name: str | None
  normalized_transcription_pk: int | None
  normalized_transcription_name: str | None

If diplomatic_transcription_pk is provided, diplomatic_transcription_name is ignored. Likewise, if normalized_transcription_pk is provided, normalized_transcription_name is ignored.

The "diplomatic" transcription is not collated, rather, it is simply "passed through" to the CollateX output. It is the "normalized" transcription that is collated.

`escriptorium_collate.collate.CollatexArgs`

A (Python) interface for passing arguments to the CollateX command line interface. For more details about the arguments accepted by the CollateX Jar CLI, consult CollateX's documentation.

class CollatexArgs(BaseModel):
  algorithm: Literal["needleman-wunsch", "medite", "dekker"] = "needleman-wunsch"
  distance: int | None
  dot_path: str | None
  format: Literal["tei", "json", "dot", "graphml", "tei"] = "json"
  input: str | None
  input_encoding: str | None
  max_collation_size: int | None
  max_parallel_collations: int | None
  output_encoding: str | None
  output: str | None
  tokenized: bool = False
  token_comparator: Literal["equality", "levenshtein"] = "equality"

`escriptorium_collate.collate.get_collatex_input`

Given two or more Witness instances and a set of CollateX arguments, return the input JSON that will be later passed to CollateX.

from escriptorium_collate.collate import get_collatex_input

collatex_input = get_collatex_input(
  escr=escr, # An EscriptoriumConnector instance
  witnesses=witnesses, # A list of two or more Witness instances to be collated
  collatex_args=collatex_args, # An instance of CollatexArgs
)

`escriptorium_collate.collate.get_collatex_output`

Pass a given instance of CollatexArgs to the CollateX JAR.

from escriptorium_collate.collate import get_collatex_output

collatex_output = get_collatex_output(
  collatex_args=collatex_args, # An instance of CollatexArgs
)

In this case, CollatexArgs.input is mandatory; in other words, the CollateX input JSON must be manually passed in.

`escriptorium_collate.collate.collate`

Run the complete collation pipeline via one function call. See the "Quick Start" section above.

`escriptorium_collate.transcription_layers`

This module contains helper functions for dealing with the transcription layers of any given eScriptorium document.

`escriptorium_collate.transcription_layers.create`

Create and initialize an arbitrarily named transcription layer within a given eScriptorium document.

from escriptorium_collate import transcription_layers

transcription_layers.create(
  escr=escr, # EscriptoriumConnector instance
  doc_pk=1, # Primary key of an eScriptorium document (int)
  layer_name="New Layer" # Name of the transcription layer to be created (str)
)

`escriptorium_collate.transcription_layers.copy`

Copy the content of one transcription layer to another transcription layer in a given eScriptorium document.

from escriptorium_collate import transcription_layers

transcription_layers.copy(
  escr=escr, # EscriptoriumConnector instance
  doc_pk=1, # Primary key of an eScriptorium document (int)
  source_transcription_layer_name="Source Layer" # Name of the transcription layer to be copied (str)
  target_transcription_layer_name="Target Layer" # Name of the transcription layer to be written into (str)
  overwrite=True # If True, content of the target transcription layer is overwritten (default: False)
)

`escriptorium_collate.transcription_layers.get_transcription_pk_by_name`

Each transcription layer is assigned a unique identifier (primary key) by eScriptorium, but it is not easy to retrieve the primary key via eScriptorium's user interface. This simple helper function returns the transcription layer's primary key, given its name and the primary key of the document to which it belongs.

from escriptorium_collate import transcription_layers

transcription_layers.get_transcription_pk_by_name(
  escr=escr, # EscriptoriumConnector instance
  doc_pk=1, # Primary key of an eScriptorium document (int)
  transcription_name="Source Layer" # Name of the desired transcription layer (str)
)

License

escriptorium-collate is distributed under the terms of the MIT license.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.15

Apr 10, 2025

0.1.14

Jan 30, 2025

0.1.13

Nov 18, 2024

0.1.12

Aug 16, 2024

This version

0.1.11

Aug 1, 2024

0.1.10

Aug 1, 2024

0.1.9

Aug 1, 2024

0.1.8

Aug 1, 2024

0.1.7

Jul 31, 2024

0.1.6

Jun 12, 2024

0.1.5

Jun 12, 2024

0.1.4

Jun 12, 2024

0.1.3

Jun 11, 2024

0.1.2

Jun 11, 2024

0.1.1

Jun 11, 2024

0.1.0

Jun 11, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

escriptorium_collate-0.1.11.tar.gz (1.9 MB view details)

Uploaded Aug 1, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

escriptorium_collate-0.1.11-py3-none-any.whl (1.9 MB view details)

Uploaded Aug 1, 2024 Python 3

File details

Details for the file escriptorium_collate-0.1.11.tar.gz.

File metadata

Download URL: escriptorium_collate-0.1.11.tar.gz
Upload date: Aug 1, 2024
Size: 1.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/23.5.0

File hashes

Hashes for escriptorium_collate-0.1.11.tar.gz
Algorithm	Hash digest
SHA256	`d650b5a03e8107c5b205aac485e88258c0dc7eda83b4ce49d71823f44adf03cd`
MD5	`b1fd2e8fbf3967408421d89c792e3e30`
BLAKE2b-256	`0b02de92f29d87b6adcc00fbe5e97d4782aac87ab8759ef630ad4ca5922908c3`

See more details on using hashes here.

File details

Details for the file escriptorium_collate-0.1.11-py3-none-any.whl.

File metadata

Download URL: escriptorium_collate-0.1.11-py3-none-any.whl
Upload date: Aug 1, 2024
Size: 1.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/23.5.0

File hashes

Hashes for escriptorium_collate-0.1.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aa9376b3ca055aea9c5500987312875198d488310cac5af156aa7af57873ba88`
MD5	`df4d0027af9f6fded477fa5aac2ac4d8`
BLAKE2b-256	`acdf04f8c26e529904c3050a351ec7038d7695e056cbba3bc05f3592c67cf139`

See more details on using hashes here.

escriptorium-collate 0.1.11

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

eScriptorium Collate

Installation

Requirements

Vendored Binaries

Virtual Environment

Install

Quick Start

Instantiate the eScriptorium Connector

Instantiate the Witness that will be Collated

Instantiate the Arguments to be Passed to CollateX

Return the CollateX Results as a Python Dictionary

API

escriptorium_collate.collate

escriptorium_collate.collate.Witness

escriptorium_collate.collate.CollatexArgs

escriptorium_collate.collate.get_collatex_input

escriptorium_collate.collate.get_collatex_output

escriptorium_collate.collate.collate

escriptorium_collate.transcription_layers

escriptorium_collate.transcription_layers.create

escriptorium_collate.transcription_layers.copy

escriptorium_collate.transcription_layers.get_transcription_pk_by_name

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`escriptorium_collate.collate`

`escriptorium_collate.collate.Witness`

`escriptorium_collate.collate.CollatexArgs`

`escriptorium_collate.collate.get_collatex_input`

`escriptorium_collate.collate.get_collatex_output`

`escriptorium_collate.collate.collate`

`escriptorium_collate.transcription_layers`

`escriptorium_collate.transcription_layers.create`

`escriptorium_collate.transcription_layers.copy`

`escriptorium_collate.transcription_layers.get_transcription_pk_by_name`