Convert digital documents in METS/MODS format to TEI
Project description
mets-mods2tei
Convert bibliographic meta data in METS/MODS format to TEI headers and optionally serialize linked ALTO-encoded OCR to TEI text.
Background
MODS is the de-facto standard for encoding bibliographic
meta data in libraries. It is usually included as a separate section into
METS XML files. Physical and logical structure of a document
are expressed in terms of structural mappings (structMap
elements).
TEI is the de-facto standard for representing digital text for research purposes. It usually includes detailed bibliographic meta data in its header.
Since these standards contain a considerable amount of degrees of freedom, the conversion uses well-defined subsets. For MODS, this is the MODS Anwendungsprofil für digitalisierte Medien. For METS, the METS Anwendungsprofil für digitalisierte Medien 2.1 is consulted. For the TEI Header, the conversion is roughly based on the DTA base format.
mets-mods2tei
is developed at the Saxon State and University Library in Dresden.
Installation
mets-mods2tei
is implemented in Python 3. In the following, we assume a working Python 3
(tested versions 3.5, 3.6 and 3.7) installation.
Setup Python
Using virtual environments is highly recommended,
although not strictly necessary for installing mets-mods2tei
.
To create a virtual environement in a subdirectory of your choice (e.g. env
), run
python3 -m venv env
(once) and then activate it (each time you open the shell) via
. env/bin/activate
Depending on how old the packages are which your base system provides, you might have to update pip first:
pip install -U pip setuptools
Get Python package
mets-mods2tei
can be installed via pip3
directly.
You can install from either the repository sources or the
prebuilt distribution on PyPI:
From repository
If you have an active virtual environment, do
pip install mets-mods2tei
Otherwise, try
pip3 install --user mets-mods2tei
From source
Get the repository:
git clone https://github.com/slub/mets-mods2tei.git
cd mets-mods2tei
If you have an active virtual environment, do
pip install .
Otherwise, try
pip3 install --user .
Testing
mets-mods2tei
uses pytest
-based testing.
To install the prerequisites for testing, (in your venv), do
pip install -r requirements-test.txt
(once) and then run the tests via:
pytest
Code coverage
Determine code coverage by running
make coverage
Invocation
Installing mets-mods2tei
makes the command-line tool mm2tei
available:
mm2tei --help
Usage: mm2tei [OPTIONS] METS
METS: File containing or URL pointing to the METS/MODS XML to be converted
Parse given METS and its meta-data, and convert it to TEI.
If `--ocr` is given, then also read the ALTO full-text files from the
fileGrp in `--text-group`, and convert page contents accordingly (in
physical order). Decorate page boundaries with image and page numbers, and
reference the corresponding base image files from `--img-group`.
Output XML to `--output (use '-' for stdout), log to stderr.`
Options:
-O, --output FILENAME File path to write TEI output to
-o, --ocr Serialize OCR into resulting TEI
-T, --text-group TEXT File group which contains the full text
-I, --img-group TEXT File group which contains the images
-l, --log-level [DEBUG|INFO|WARN|ERROR|OFF]
-h, --help Show this message and exit.
It reads METS XML via URL or file argument and prints the resulting TEI, including the extracted information from the MODS part of the METS.
Example:
mm2tei -O tei.xml "https://digital.slub-dresden.de/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:de:slub-dresden:db:id-453779263"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mets-mods2tei-0.1.2.tar.gz
.
File metadata
- Download URL: mets-mods2tei-0.1.2.tar.gz
- Upload date:
- Size: 117.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24d014ee0726284c7a47c060594660cc90c823a59a6b4d6e7199e731a04b8bc9 |
|
MD5 | ac83a4b00d21011fb0057f7169985fd1 |
|
BLAKE2b-256 | 79060ae46cd44484b6055d8b5669d8dea6a81a45fd2a864b93a8e1d82a793cf6 |
File details
Details for the file mets_mods2tei-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: mets_mods2tei-0.1.2-py3-none-any.whl
- Upload date:
- Size: 134.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6dfe587bdc64665c3d525e05359184543ca3045e74e652b7536a4d3f357a02d0 |
|
MD5 | 2c0b32d68d8afea1310116d51d11406a |
|
BLAKE2b-256 | 22433d94995e09eda2c3b37a4346d9b1aaafc8f588678e505a7bbadb88d3d9b5 |