Skip to main content

Utility class wrapping lxml for reading data from MODS v3.4 XML metadata into Python data types.

Project description

pymods is utility module for working with the Library of Congress’s MODS XML standard: Metadata Description Schema (MODS). It is a utility wrapper for the lxml module specific to deserializing data out of MODSXML into python data types.

If you need a module to serialize data into MODSXML, see the other pymods by Matt Cordial.

Installing

Recommended:

pip install pymods

Using

Basics

XML is parsed using the MODSReader class:

mods_records = pymods.MODSReader('some_file.xml')

Individual records are stored as an iterator of the MODSRecord object:

In [5]: for record in mods_records:
  ....:    print(record)
  ....:
<Element {http://www.loc.gov/mods/v3}mods at 0x47a69f8>
<Element {http://www.loc.gov/mods/v3}mods at 0x47fd908>
<Element {http://www.loc.gov/mods/v3}mods at 0x47fda48>

MODSReader will work with mods:modsCollection documents, outputs from OAI-PMH feeds, or individual MODSXML documents with mods:mods as the root element.

pymods.MODSRecord

The MODSReader class parses each mods:mods element into a pymods.MODSRecord object. pymods.MODSRecord is a custom wrapper class for the lxml.ElementBase class. All children of pymods.Record inherit the lxml._Element and lxml.ElementBase methods.

In [6]: record = next(pymods.MODSReader('example.xml'))
In [7]: print(record.nsmap)
{'dcterms': 'http://purl.org/dc/terms/', 'xsi': 'http://www.w3.org/2001/XMLSchema-instance', None: 'http://www.loc.gov/mods/v3', 'flvc': 'info:flvc/manifest/v1', 'xlink': 'http://www.w3.org/1999/xlink', 'mods': 'http://www.loc.gov/mods/v3'}
In [8]: for child in record.iterdescendants():
  ....:    print(child.tag)

{http://www.loc.gov/mods/v3}identifier
{http://www.loc.gov/mods/v3}extension
{info:flvc/manifest/v1}flvc
{info:flvc/manifest/v1}owningInstitution
{info:flvc/manifest/v1}submittingInstitution
{http://www.loc.gov/mods/v3}titleInfo
{http://www.loc.gov/mods/v3}title
{http://www.loc.gov/mods/v3}name
{http://www.loc.gov/mods/v3}namePart
{http://www.loc.gov/mods/v3}role
{http://www.loc.gov/mods/v3}roleTerm
{http://www.loc.gov/mods/v3}roleTerm
{http://www.loc.gov/mods/v3}typeOfResource
{http://www.loc.gov/mods/v3}genre
...

Methods

All functions return data either as a string, list, list of named tuples. See the appropriate docstring for details.

>>> record.genre?
Type:        property
String form: <property object at 0x0000000004812C78>
Docstring:
Accesses mods:genre element.
:return: A list containing Genre elements with term, authority,
    authorityURI, and valueURI attributes.

Examples

Importing

from pymods import MODSReader, MODSRecord

Parsing a file

In [10]: mods = MODSReader('example.xml')
In [11]: for record in mods:
   ....:    print(record.dates)
   ....:
[Date(text='1966-12-08', type='{http://www.loc.gov/mods/v3}dateCreated')]
None
[Date(text='1987-02', type='{http://www.loc.gov/mods/v3}dateIssued')]

Simple tasks

Generating a title list

In [14]: for record in mods:
   ....:     print(record.titles)
   ....:
['Fire Line System']
['$93,668.90. One Mill Tax Apportioned by Various Ways Proposed']
['Broward NOW News: National Organization for Women, February 1987']

Creating a subject list

In [17]: for record in mods:
   ....:     for subject in record.subjects:
   ....:         print(subject.text)
   ....:
Concert halls
Architecture
Architectural drawings
Structural systems
Structural systems drawings
Structural drawings
Safety equipment
Construction
Mechanics
Structural optimization
Architectural design
Fire prevention--Safety measures
Taxes
Tax payers
Tax collection
Organizations
Feminism
Sex discrimination against women
Women's rights
Equal rights amendments
Women--Societies and clubs
National Organization for Women

More complex tasks

Creating a list of subject URI’s only for LCSH subjects

In [18]: for record in mods:
   ....:     for subject in record.subjects:
   ....:         if 'lcsh' == subject.authority:
   ....:             print(subject.uri)
   ....:
http://id.loc.gov/authorities/subjects/sh85082767
http://id.loc.gov/authorities/subjects/sh88004614
http://id.loc.gov/authorities/subjects/sh85132810
http://id.loc.gov/authorities/subjects/sh85147343

Get URLs for objects using a No Copyright US rightsstatement.org URI

In [23]: for record in mods:
   ....:     for rights_elem in record.rights
   ....:         if rights_elem.uri == 'http://rightsstatements.org/vocab/NoC-US/1.0/':
   ....:             print(record.purl)
   ....:
http://purl.flvc.org/fsu/fd/FSU_MSS0204_B01_F10_09
http://purl.flvc.org/fsu/fd/FSU_MSS2008003_B18_F01_004

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymods-2.0.3.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

pymods-2.0.3-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file pymods-2.0.3.tar.gz.

File metadata

  • Download URL: pymods-2.0.3.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pymods-2.0.3.tar.gz
Algorithm Hash digest
SHA256 001865fd94747bf543e7fe1a589f04a6563a1bde5f8f22d6852d7918ff33e0b7
MD5 fe745f888eb39489d8129b96aae04679
BLAKE2b-256 02cbb1f15d0b2613eca2a2fe6c0f184f311689fb6c4cce17967f098e2c08e849

See more details on using hashes here.

File details

Details for the file pymods-2.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for pymods-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 cc3f4a65c1deb2cf5fb3033609325fa921cf34a8668ba6249c3b4845a1783651
MD5 e4f370f699d17dbe83789f8c6d6a6800
BLAKE2b-256 35fc2e9b67623c0596677b5a61c1be05b840f65ba773b646429afa5c3f8e819c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page