Skip to main content

Utility class wrapping lxml for reading data from MODS v3.4 XML metadata into Python data types.

Project description

pymods is utility module for working with the Library of Congress’s MODS XML standard: Metadata Description Schema (MODS). It is a utility wrapper for the lxml module specific to deserializing data out of MODSXML into python data types.

If you need a module to serialize data into MODSXML, see the other pymods by Matt Cordial.



pip install pymods



XML is parsed using the MODSReader class:

mods_records = pymods.MODSReader('some_file.xml')

Individual records are stored as an iterator of the MODSRecord object:

In [5]: for record in mods_records:
  ....:    print(record)
<Element {}mods at 0x47a69f8>
<Element {}mods at 0x47fd908>
<Element {}mods at 0x47fda48>

MODSReader will work with mods:modsCollection documents, outputs from OAI-PMH feeds, or individual MODSXML documents with mods:mods as the root element.


The MODSReader class parses each mods:mods element into a pymods.MODSRecord object. pymods.MODSRecord is a custom wrapper class for the lxml.ElementBase class. All children of pymods.Record inherit the lxml._Element and lxml.ElementBase methods.

In [6]: record = next(pymods.MODSReader('example.xml'))
In [7]: print(record.nsmap)
{'dcterms': '', 'xsi': '', None: '', 'flvc': 'info:flvc/manifest/v1', 'xlink': '', 'mods': ''}
In [8]: for child in record.iterdescendants():
  ....:    print(child.tag)



All functions return data either as a string, list, list of named tuples. See the appropriate docstring for details.

>>> record.genre?
Type:        property
String form: <property object at 0x0000000004812C78>
Accesses mods:genre element.
:return: A list containing Genre elements with term, authority,
    authorityURI, and valueURI attributes.



from pymods import MODSReader, MODSRecord

Parsing a file

In [10]: mods = MODSReader('example.xml')
In [11]: for record in mods:
   ....:    print(record.dates)
[Date(text='1966-12-08', type='{}dateCreated')]
[Date(text='1987-02', type='{}dateIssued')]

Simple tasks

Generating a title list

In [14]: for record in mods:
   ....:     print(record.titles)
['Fire Line System']
['$93,668.90. One Mill Tax Apportioned by Various Ways Proposed']
['Broward NOW News: National Organization for Women, February 1987']

Creating a subject list

In [17]: for record in mods:
   ....:     for subject in record.subjects:
   ....:         print(subject.text)
Concert halls
Architectural drawings
Structural systems
Structural systems drawings
Structural drawings
Safety equipment
Structural optimization
Architectural design
Fire prevention--Safety measures
Tax payers
Tax collection
Sex discrimination against women
Women's rights
Equal rights amendments
Women--Societies and clubs
National Organization for Women

More complex tasks

Creating a list of subject URI’s only for LCSH subjects

In [18]: for record in mods:
   ....:     for subject in record.subjects:
   ....:         if 'lcsh' == subject.authority:
   ....:             print(subject.uri)

Get URLs for objects using a No Copyright US URI

In [23]: for record in mods:
   ....:     for rights_elem in record.rights
   ....:         if rights_elem.uri == '':
   ....:             print(record.purl)

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pymods, version 2.0.8
Filename, size File type Python version Upload date Hashes
Filename, size pymods-2.0.8-py3-none-any.whl (19.5 kB) File type Wheel Python version 3.6 Upload date Hashes View hashes
Filename, size pymods-2.0.8.tar.gz (15.6 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page