Skip to main content

Convert digital documents in METS/MODS format to TEI

Project description

mets-mods2tei

CircleCI codecov

Convert bibliographic meta data in METS/MODS format to TEI headers and optionally serialize linked ALTO-encoded OCR to TEI text.

Background

MODS is the de-facto standard for encoding bibliographic meta data in libraries. It is usually included as a separate section into METS XML files. Physical and logical structure of a document are expressed in terms of structural mappings (structMap elements).

TEI is the de-facto standard for representing digital text for research purposes. It usually includes detailed bibliographic meta data in its header.

Since these standards contain a considerable amount of degrees of freedom, the conversion uses well-defined subsets. For MODS, this is the MODS Anwendungsprofil für digitalisierte Medien. For METS, the METS Anwendungsprofil für digitalisierte Medien 2.1 is consulted. For the TEI Header, the conversion is roughly based on the DTA base format.

mets-mods2tei is developed at the Saxon State and University Library in Dresden.

Installation

mets-mods2tei is implemented in Python 3. In the following, we assume a working Python 3 (tested versions 3.5, 3.6 and 3.7) installation.

Setup Python

Using virtual environments is highly recommended, although not strictly necessary for installing mets-mods2tei.

To create a virtual environement in a subdirectory of your choice (e.g. env), run

python3 -m venv env

(once) and then activate it (each time you open the shell) via

. env/bin/activate

Depending on how old the packages are which your base system provides, you might have to update pip first:

pip install -U pip setuptools

Get Python package

mets-mods2tei can be installed via pip3 directly. You can install from either the repository sources or the prebuilt distribution on PyPI:

From repository

If you have an active virtual environment, do

pip install mets-mods2tei

Otherwise, try

pip3 install --user mets-mods2tei

From source

Get the repository:

git clone https://github.com/slub/mets-mods2tei.git
cd mets-mods2tei

If you have an active virtual environment, do

pip install .

Otherwise, try

pip3 install --user .

Testing

mets-mods2tei uses pytest-based testing.

To install the prerequisites for testing, (in your venv), do

pip install -r requirements-test.txt

(once) and then run the tests via:

pytest

Code coverage

Determine code coverage by running

make coverage

Invocation

Installing mets-mods2tei makes the command-line tool mm2tei available:

mm2tei --help
Usage: mm2tei [OPTIONS] METS

  METS: File containing or URL pointing to the METS/MODS XML to be converted

  Parse given METS and its meta-data, and convert it to TEI.

  If `--ocr` is given, then also read the ALTO full-text files from the
  fileGrp in `--text-group`, and convert page contents accordingly (in
  physical order). Decorate page boundaries with image and page numbers, and
  reference the corresponding base image files from `--img-group`.

  Output XML to `--output (use '-' for stdout), log to stderr.`

Options:
  -O, --output FILENAME           File path to write TEI output to
  -o, --ocr                       Serialize OCR into resulting TEI
  -T, --text-group TEXT           File group which contains the full text
  -I, --img-group TEXT            File group which contains the images
  -l, --log-level [DEBUG|INFO|WARN|ERROR|OFF]
  -h, --help                      Show this message and exit.

It reads METS XML via URL or file argument and prints the resulting TEI, including the extracted information from the MODS part of the METS.

Example:

mm2tei -O tei.xml "https://digital.slub-dresden.de/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:de:slub-dresden:db:id-453779263"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mets-mods2tei-0.1.2.tar.gz (117.7 kB view details)

Uploaded Source

Built Distribution

mets_mods2tei-0.1.2-py3-none-any.whl (134.7 kB view details)

Uploaded Python 3

File details

Details for the file mets-mods2tei-0.1.2.tar.gz.

File metadata

  • Download URL: mets-mods2tei-0.1.2.tar.gz
  • Upload date:
  • Size: 117.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.15

File hashes

Hashes for mets-mods2tei-0.1.2.tar.gz
Algorithm Hash digest
SHA256 24d014ee0726284c7a47c060594660cc90c823a59a6b4d6e7199e731a04b8bc9
MD5 ac83a4b00d21011fb0057f7169985fd1
BLAKE2b-256 79060ae46cd44484b6055d8b5669d8dea6a81a45fd2a864b93a8e1d82a793cf6

See more details on using hashes here.

File details

Details for the file mets_mods2tei-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: mets_mods2tei-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 134.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.15

File hashes

Hashes for mets_mods2tei-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6dfe587bdc64665c3d525e05359184543ca3045e74e652b7536a4d3f357a02d0
MD5 2c0b32d68d8afea1310116d51d11406a
BLAKE2b-256 22433d94995e09eda2c3b37a4346d9b1aaafc8f588678e505a7bbadb88d3d9b5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page