Skip to main content

Convert digital documents in METS/MODS format to TEI

Project description

mets-mods2tei

CircleCI codecov

Convert bibliographic meta data in METS/MODS format to TEI headers and optionally serialize linked ALTO-encoded OCR to TEI text.

Background

MODS is the de-facto standard for encoding bibliographic meta data in libraries. It is usually included as a separate section into METS XML files. Physical and logical structure of a document are expressed in terms of structural mappings (structMap elements).

TEI is the de-facto standard for representing digital text for research purposes. It usually includes detailed bibliographic meta data in its header.

Since these standards contain a considerable amount of degrees of freedom, the conversion uses well-defined subsets. For MODS, this is the MODS Anwendungsprofil für digitalisierte Medien. For METS, the METS Anwendungsprofil für digitalisierte Medien 2.1 is consulted. For the TEI Header, the conversion is roughly based on the DTA base format.

mets-mods2tei is developed at the Saxon State and University Library in Dresden.

Installation

mets-mods2tei is implemented in Python 3. In the following, we assume a working Python 3 (tested versions 3.5, 3.6 and 3.7) installation.

Clone the repository

The first installation step is the cloning of the repository:

$ git clone https://github.com/wrznr/mets-mods2tei.git
$ cd mets-mods2tei

Setup Python

Using virtual environments is highly recommended, although not strictly necessary for installing mets-mods2tei.

To create a virtual environement in a subdirectory of your choice (e.g. env), run

python3 -m venv env

(once) and then activate it (each time you open the shell) via

. env/bin/activate

Depending on how old the packages are which your base system provides, you might have to update pip first:

pip install -U pip setuptools

Python requirements

mets-mods2tei can be installed via pip3 directly. If you have an active virtual environment, do

pip install .

Otherwise, try

pip3 install --user .

Testing

mets-mods2tei uses pytest-based testing.

To install the prerequisites for testing, (in your venv), do

pip install -r requirements-test.txt

(once) and then run the tests via:

pytest

Code coverage

Determine code coverage by running

make coverage

Invocation

Installing mets-mods2tei makes the command-line tool mm2tei available:

mm2tei --help
Usage: mm2tei [OPTIONS] METS

  METS: File containing or URL pointing to the METS/MODS XML to be converted

Options:
  -o, --ocr                       Serialize OCR into resulting TEI
  -l, --log-level [DEBUG|INFO|WARN|ERROR|OFF]
  --help                          Show this message and exit.

It reads METS XML via URL or file argument and prints the resulting TEI including the extracted information from the MODS part of the METS.

Example:

mm2tei "https://digital.slub-dresden.de/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:de:slub-dresden:db:id-453779263"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mets-mods2tei-0.1.1.tar.gz (151.4 kB view details)

Uploaded Source

Built Distribution

mets_mods2tei-0.1.1-py3-none-any.whl (142.5 kB view details)

Uploaded Python 3

File details

Details for the file mets-mods2tei-0.1.1.tar.gz.

File metadata

  • Download URL: mets-mods2tei-0.1.1.tar.gz
  • Upload date:
  • Size: 151.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.5.0 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.3

File hashes

Hashes for mets-mods2tei-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e9ab661b62c3922d61990bf829f8d18f070f190c8aedea1ee28aeb63a1e70706
MD5 5dab0a93fcd31121d83e146fd6a0bd0d
BLAKE2b-256 fde9efa43651859a3bfd602d442bba0ee983c1a591bb406660fc816a859856b1

See more details on using hashes here.

File details

Details for the file mets_mods2tei-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mets_mods2tei-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 142.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.5.0 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.3

File hashes

Hashes for mets_mods2tei-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a6ad87d0c6c07c6d7599fef0d8327cdd69a98c7b77f6494b2b774f443713f6e6
MD5 93dc7a0f49fa8d95d2b686252436f89d
BLAKE2b-256 2d53e2a3e7f42e285ece1be0887a1f0eb1430ebe3454e68a60f92b0780b46cdf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page