Skip to main content

Convert digital documents in METS/MODS format to TEI

Project description

mets-mods2tei

CircleCI codecov

Convert bibliographic meta data in METS/MODS format to TEI headers and optionally serialize linked ALTO-encoded OCR to TEI text.

Background

MODS is the de-facto standard for encoding bibliographic meta data in libraries. It is usually included as a separate section into METS XML files. Physical and logical structure of a document are expressed in terms of structural mappings (structMap elements).

TEI is the de-facto standard for representing digital text for research purposes. It usually includes detailed bibliographic meta data in its header.

Since these standards contain a considerable amount of degrees of freedom, the conversion uses well-defined subsets. For MODS, this is the MODS Anwendungsprofil für digitalisierte Medien. For METS, the METS Anwendungsprofil für digitalisierte Medien 2.1 is consulted. For the TEI Header, the conversion is roughly based on the DTA base format.

mets-mods2tei is developed at the Saxon State and University Library in Dresden.

Installation

mets-mods2tei is implemented in Python 3. In the following, we assume a working Python 3 (tested versions 3.5, 3.6 and 3.7) installation.

Setup Python

Using virtual environments is highly recommended, although not strictly necessary for installing mets-mods2tei.

To create a virtual environement in a subdirectory of your choice (e.g. env), run

python3 -m venv env

(once) and then activate it (each time you open the shell) via

. env/bin/activate

Depending on how old the packages are which your base system provides, you might have to update pip first:

pip install -U pip setuptools

Get Python package

mets-mods2tei can be installed via pip3 directly. You can install from either the repository sources or the prebuilt distribution on PyPI:

From repository

If you have an active virtual environment, do

pip install mets-mods2tei

Otherwise, try

pip3 install --user mets-mods2tei

From source

Get the repository:

git clone https://github.com/slub/mets-mods2tei.git
cd mets-mods2tei

If you have an active virtual environment, do

pip install .

Otherwise, try

pip3 install --user .

Testing

mets-mods2tei uses pytest-based testing.

To install the prerequisites for testing, (in your venv), do

pip install -r requirements-test.txt

(once) and then run the tests via:

pytest

Code coverage

Determine code coverage by running

make coverage

Invocation

Installing mets-mods2tei makes the command-line tool mm2tei available:

mm2tei --help
Usage: mm2tei [OPTIONS] METS

  METS: File containing or URL pointing to the METS/MODS XML to be converted

  Parse given METS and its meta-data, and convert it to TEI.

  If `--ocr` is given, then also read the ALTO full-text files from the
  fileGrp in `--text-group`, and convert page contents accordingly (in
  physical order). Decorate page boundaries with image and page numbers, and
  reference the corresponding base image files from `--img-group`.

  Output XML to `--output (use '-' for stdout), log to stderr.`

Options:
  -O, --output FILENAME           File path to write TEI output to
  -o, --ocr                       Serialize OCR into resulting TEI
  -T, --text-group TEXT           File group which contains the full text
  -I, --img-group TEXT            File group which contains the images
  -l, --log-level [DEBUG|INFO|WARN|ERROR|OFF]
  -h, --help                      Show this message and exit.

It reads METS XML via URL or file argument and prints the resulting TEI, including the extracted information from the MODS part of the METS.

Example:

mm2tei -O tei.xml "https://digital.slub-dresden.de/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:de:slub-dresden:db:id-453779263"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mets-mods2tei-0.1.2.tar.gz (117.7 kB view hashes)

Uploaded Source

Built Distribution

mets_mods2tei-0.1.2-py3-none-any.whl (134.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page