Parser for GIS metadata standards including ArcGIS, FGDC and ISO-19115
Project description
gis-metadata-parser
XML parsers for GIS metadata that are designed to read in, validate, update and output a core set of properties that have been mapped between the most common standards, currently:
- FGDC
- ISO-19139 (and ISO-19115)
- ArcGIS (tested with ArcGIS format 1.0).
This library is compatible with Python versions 2.7 and 3.4 through 3.6.
Installation
Install with pip install gis-metadata-parser
.
Usage
Parsers can be instantiated from files, XML strings or URLs. They can be converted from one standard to another as well.
from gis_metadata.arcgis_metadata_parser import ArcGISParser from gis_metadata.fgdc_metadata_parser import FgdcParser from gis_metadata.iso_metadata_parser import IsoParser from gis_metadata.metadata_parser import get_metadata_parser # From file objects with open(r'/path/to/metadata.xml') as metadata: fgdc_from_file = FgdcParser(metadata) with open(r'/path/to/metadata.xml') as metadata: iso_from_file = IsoParser(metadata) # Detect standard based on root element, metadata fgdc_from_string = get_metadata_parser( """ <?xml version='1.0' encoding='UTF-8'?> <metadata> <idinfo> </idinfo> </metadata> """ ) # Detect ArcGIS standard based on root element and its nodes iso_from_string = get_metadata_parser( """ <?xml version='1.0' encoding='UTF-8'?> <metadata> <dataIdInfo/></dataIdInfo> <distInfo/></distInfo> <dqInfo/></dqInfo> </metadata> """ ) # Detect ISO standard based on root element, MD_Metadata or MI_Metadata iso_from_string = get_metadata_parser( """ <?xml version='1.0' encoding='UTF-8'?> <MD_Metadata> <identificationInfo> </identificationInfo> </MD_Metadata> """ ) # Convert from one standard to another fgdc_converted = iso_from_file.convert_to(FgdcParser) iso_converted = fgdc_from_file.convert_to(IsoParser) arcgis_converted = iso_converted.convert_to(ArcGISParser) # Output supported properties as key value pairs (dict) fgdc_key_vals = fgdc_from_file.convert_to(dict) iso_key_vals = iso_from_file.convert_to(dict)
Finally, the properties of the parser can be updated, validated, applied and output:
with open(r'/path/to/metadata.xml') as metadata: fgdc_from_file = FgdcParser(metadata) # Example simple properties fgdc_from_file.title fgdc_from_file.abstract fgdc_from_file.place_keywords fgdc_from_file.thematic_keywords # :see: gis_metadata.utils.SUPPORTED_PROPS for list of all supported properties # Complex properties fgdc_from_file.attributes fgdc_from_file.bounding_box fgdc_from_file.contacts fgdc_from_file.dates fgdc_from_file.digital_forms fgdc_from_file.larger_works fgdc_from_file.process_steps fgdc_from_file.raster_info # :see: gis_metadata.utils.COMPLEX_DEFINITIONS for structure of all complex properties # Update properties fgdc_from_file.title = 'New Title' fgdc_from_file.dates = {'type': 'single' 'values': '1/1/2016'} # Apply updates fgdc_from_file.validate() # Ensure updated properties are valid fgdc_from_file.serialize() # Output updated XML as a string fgdc_from_file.write() # Output updated XML to existing file fgdc_from_file.write(out_file_or_path='/path/to/updated.xml') # Output updated XML to new file
Extending and Customizing
Tips
There are a few unwritten (until now) rules about the way the metadata parsers are wired to work:
- Properties are generally defined by XPATH in each
parser._data_map
- Simple parser properties accept only values of
string
andlist
's ofstring
's - XPATH's configured in the data map support references to element attributes:
'path/to/element/@attr'
- Complex parser properties are defined by custom parser/updater functions instead of by XPATH
- Complex parser properties accept values of type
dict
containing simple properties, or a list of saiddict
's - XPATH keys in the data map with leading underscores are parsed, but not validated or written out
- XPATH keys in the data map that "shadow" other properties but with a leading underscore serve as secondary values
- Secondary values are used in the absence of a primary value if primary location (element or attribute) is missing
- Additional underscores indicate further locations to check for missing values, i.e.
title
,_title
,__title
Some examples of existing secondary properties are as follows:
# In the ArcGIS parser for distribution contact phone: ARCGIS_TAG_FORMATS = frozendict({ ... 'dist_phone': 'distInfo/distributor/distorCont/rpCntInfo/cntPhone/voiceNum', '_dist_phone': 'distInfo/distributor/distorCont/rpCntInfo/voiceNum', # If not in cntPhone ... }) # In the FGDC parser for sub-properties in the contacts definition: FGDC_DEFINITIONS = dict({k: dict(v) for k, v in iteritems(COMPLEX_DEFINITIONS)}) FGDC_DEFINITIONS[CONTACTS].update({ '_name': '{_name}', '_organization': '{_organization}' }) ... class FgdcParser(MetadataParser): ... def _init_data_map(self): ... ct_format = FGDC_TAG_FORMATS[CONTACTS] fgdc_data_structures[CONTACTS] = format_xpaths( ... name=ct_format.format(ct_path='cntperp/cntper'), _name=ct_format.format(ct_path='cntorgp/cntper'), # If not in cntperp organization=ct_format.format(ct_path='cntperp/cntorg'), _organization=ct_format.format(ct_path='cntorgp/cntorg'), # If not in cntperp ) # Also see the ISO parser for secondary and tertiary sub-properties in the attributes definition: ISO_DEFINITIONS = dict({k: dict(v) for k, v in iteritems(COMPLEX_DEFINITIONS)}) ISO_DEFINITIONS[ATTRIBUTES].update({ '_definition_source': '{_definition_src}', '__definition_source': '{__definition_src}', '___definition_source': '{___definition_src}' })
Examples
Any of the supported parsers can be extended to include more of a standard's supported data. In this example we'll add two new properties to the IsoParser
:
metadata_language
: a simple string field describing the language of the metadata file itself (not the dataset)metadata_contacts
: a complex structure with contact info leveraging and enhancing the existing contact structure
This example will cover:
- Adding a new simple property
- Configuring a secondary location for a property value
- Referencing an element attribute in an XPATH
- Adding a new complex property
- Customizing the complex property to include a new sub-property
Also, this example is specifically covered by unit tests.
from gis_metadata.iso_metadata_parser import IsoParser from gis_metadata.utils import COMPLEX_DEFINITIONS, CONTACTS, format_xpaths, ParserProperty class CustomIsoParser(IsoParser): def _init_data_map(self): super(CustomIsoParser, self)._init_data_map() # 1. Basic property: text or list (with secondary location referencing `codeListValue` attribute) lang_prop = 'metadata_language' self._data_map[lang_prop] = 'language/CharacterString' # Parse from here if present self._data_map['_' + lang_prop] = 'language/LanguageCode/@codeListValue' # Otherwise, try from here # 2. Complex structure (reuse of contacts structure plus phone) # 2.1 Define some basic variables ct_prop = 'metadata_contacts' ct_xpath = 'contact/CI_ResponsibleParty/{ct_path}' ct_defintion = COMPLEX_DEFINITIONS[CONTACTS] ct_defintion['phone'] = '{phone}' # 2.2 Reuse CONTACT structure to specify locations per prop (adapted from parent to add `phone`) self._data_structures[ct_prop] = format_xpaths( ct_defintion, name=ct_xpath.format(ct_path='individualName/CharacterString'), organization=ct_xpath.format(ct_path='organisationName/CharacterString'), position=ct_xpath.format(ct_path='positionName/CharacterString'), phone=ct_xpath.format( ct_path='contactInfo/CI_Contact/phone/CI_Telephone/voice/CharacterString' ), email=ct_xpath.format( ct_path='contactInfo/CI_Contact/address/CI_Address/electronicMailAddress/CharacterString' ) ) # 2.3 Set the contact root to insert new elements at "contact" level given the defined path: # 'contact/CI_ResponsibleParty/...' # By default we would get multiple "CI_ResponsibleParty" elements under a single "contact" # This way we get multiple "contact" elements, each with its own single "CI_ResponsibleParty" self._data_map['_{prop}_root'.format(prop=ct_prop)] = 'contact' # 2.4 Leverage the default methods for parsing complex properties (or write your own parser/updater) self._data_map[ct_prop] = ParserProperty(self._parse_complex_list, self._update_complex_list) # 3. And finally, let the parent validation logic know about the two new custom properties self._metadata_props.add(lang_prop) self._metadata_props.add(ct_prop) with open(r'/path/to/metadata.xml') as metadata: iso_from_file = CustomIsoParser(metadata) iso_from_file.metadata_language iso_from_file.metadata_contacts
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for gis-metadata-parser-2.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 752cf78b3b51cfedf51e374e6d0fc5521fe74309094715dd31d82a7495e0c801 |
|
MD5 | 894021496e11da66f9b190c7ea3a0e5b |
|
BLAKE2-256 | 6bff48a3c2c6df270a2338e1d3adee718e9df436e76f2693ddbcb3d2379dd649 |
Hashes for gis_metadata_parser-2.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 98879b65e341b72d1bc47ca2eb1671128418aad4e2a8b30a6b8951f6dffdbfaa |
|
MD5 | 512ed3a2f5425cff8af970b7ddcb0079 |
|
BLAKE2-256 | b0073ce5d113df721ebf12d3fd5461431e6bd4bbb2a573f5114ed9197b999753 |