Convert digital documents in METS/MODS format to TEI
Project description
mets-mods2tei
Convert bibliographic meta data in METS/MODS format to TEI headers and optionally serialize linked ALTO-encoded OCR to TEI text.
Background
MODS is the de-facto standard for encoding bibliographic
meta data in libraries. It is usually included as a separate section into
METS XML files. Physical and logical structure of a document
are expressed in terms of structural mappings (structMap
elements).
TEI is the de-facto standard for representing digital text for research purposes. It usually includes detailed bibliographic meta data in its header.
Since these standards contain a considerable amount of degrees of freedom, the conversion uses well-defined subsets. For MODS, this is the MODS Anwendungsprofil für digitalisierte Medien. For METS, the METS Anwendungsprofil für digitalisierte Medien 2.1 is consulted. For the TEI Header, the conversion is roughly based on the DTA base format.
mets-mods2tei
is developed at the Saxon State and University Library in Dresden.
Installation
mets-mods2tei
is implemented in Python 3. In the following, we assume a working Python 3
(tested versions 3.5, 3.6 and 3.7) installation.
Clone the repository
The first installation step is the cloning of the repository:
$ git clone https://github.com/wrznr/mets-mods2tei.git
$ cd mets-mods2tei
Setup Python
Using virtual environments is highly recommended,
although not strictly necessary for installing mets-mods2tei
.
To create a virtual environement in a subdirectory of your choice (e.g. env
), run
python3 -m venv env
(once) and then activate it (each time you open the shell) via
. env/bin/activate
Depending on how old the packages are which your base system provides, you might have to update pip first:
pip install -U pip setuptools
Python requirements
mets-mods2tei
can be installed via pip3
directly.
If you have an active virtual environment, do
pip install .
Otherwise, try
pip3 install --user .
Testing
mets-mods2tei
uses pytest
-based testing.
To install the prerequisites for testing, (in your venv), do
pip install -r requirements-test.txt
(once) and then run the tests via:
pytest
Code coverage
Determine code coverage by running
make coverage
Invocation
Installing mets-mods2tei
makes the command-line tool mm2tei
available:
mm2tei --help
Usage: mm2tei [OPTIONS] METS
METS: File containing or URL pointing to the METS/MODS XML to be converted
Options:
-o, --ocr Serialize OCR into resulting TEI
-l, --log-level [DEBUG|INFO|WARN|ERROR|OFF]
--help Show this message and exit.
It reads METS XML via URL or file argument and prints the resulting TEI including the extracted information from the MODS part of the METS.
Example:
mm2tei "https://digital.slub-dresden.de/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:de:slub-dresden:db:id-453779263"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mets-mods2tei-0.1.1.tar.gz
.
File metadata
- Download URL: mets-mods2tei-0.1.1.tar.gz
- Upload date:
- Size: 151.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.5.0 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9ab661b62c3922d61990bf829f8d18f070f190c8aedea1ee28aeb63a1e70706 |
|
MD5 | 5dab0a93fcd31121d83e146fd6a0bd0d |
|
BLAKE2b-256 | fde9efa43651859a3bfd602d442bba0ee983c1a591bb406660fc816a859856b1 |
File details
Details for the file mets_mods2tei-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: mets_mods2tei-0.1.1-py3-none-any.whl
- Upload date:
- Size: 142.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.5.0 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6ad87d0c6c07c6d7599fef0d8327cdd69a98c7b77f6494b2b774f443713f6e6 |
|
MD5 | 93dc7a0f49fa8d95d2b686252436f89d |
|
BLAKE2b-256 | 2d53e2a3e7f42e285ece1be0887a1f0eb1430ebe3454e68a60f92b0780b46cdf |