This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

Python script for converting MARC 21 Classification records (serialized as MARCXML) to SKOS concepts.

Developed to support the project “Felles terminologi for klassifikasjon med Dewey”, it has only been tested with Dewey Decimal Classification (DDC) records. Issues and suggestions for generalizations and improvements are welcome!

Installation

Using Pip:

$ pip install -U git+https://github.com/scriptotek/mc2skos.git
  • Works with both Python 2.x and 3.x. See Travis for details on tested Python versions.
  • If lxml fails to install on Windows, try the windows installer from from PyPI.
  • Make sure the Python scripts folder has been added to your PATH.

Usage example

mc2skos infile.xml outfile.ttl

Run mc2skos -h for options.

URIs

For records with 084 $a == "ddc", URIs are generated on the form http://dewey.info/{collection}/{object}/e{edition}/, where {collection} is “class”, “table” or “scheme”, and {edition} is taken from 084 $c (with language code stripped).

<http://dewey.info/class/6--982/e21/> a skos:Concept ;
    skos:inScheme <http://dewey.info/scheme/edition/e21/>,
        <http://dewey.info/table/6/e21/> ;
    skos:notation "T6--982" ;
    skos:prefLabel "Chibchan and Paezan languages"@en .

To override this, you can specify --uri to set a URI template for classes and table record, --scheme to set a URI to be used with skos:inScheme for all records, and --table_scheme to set a URI template to be used with skos:inScheme for table records. Note that if --uri is specified, but not --scheme, no skos:inScheme will be added. Same goes with --table_scheme.

Mapping schema

Only a small part of the MARC21 Classification data model is converted, and the conversion follows a rather pragmatic approach, exemplified by the mapping of the 7XX fields to skos:altLabel.

MARC21XML RDF
153 $a, $c, $z Classification number skos:notation
153 $j Caption skos:prefLabel
153 $e, $f, $z Classification number hierarchy skos:broader
253 Complex See Reference skos:editorialNote
353 Complex See Also Reference skos:editorialNote
680 Scope Note skos:scopeNote
683 Application Instruction Note skos:editorialNote
685 History Note skos:historyNote
694 ??? Note skos:editorialNote
700 Index Term-Personal Name skos:altLabel
710 Index Term-Corporate Name skos:altLabel
711 Index Term-Meeting Name skos:altLabel
730 Index Term-Uniform Title skos:altLabel
748 Index Term-Chronological skos:altLabel
750 Index Term-Topical skos:altLabel
751 Index Term-Geographic Name skos:altLabel
765 Synthesized Number Components mads:componentList (see below)

Synthesized number components

Components of synthesized numbers explicitly described in 765 fields are expressed using the mads:componentList property, and to preserve the order of the components, we use RDF lists. Example:

@prefix mads: <http://www.loc.gov/mads/rdf/v1#> .

<http://dewey.info/class/001.30973/e23/> a skos:Concept ;
    mads:componentList (
        <http://dewey.info/class/001.3/e23/>
        <http://dewey.info/class/T1--09/e23/>
        <http://dewey.info/class/T2--73/e23/>
    ) ;
    skos:notation "001.30973" .

Retrieving list members in order is surprisingly hard with SPARQL. Retrieving ordered pairs is the best solution I’ve come up with so far:

PREFIX mads: <http://www.loc.gov/mads/rdf/v1#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?c1_notation ?c1_label ?c2_notation ?c2_label
WHERE { GRAPH <http://localhost/ddc23no> {

    <http://dewey.info/class/001.30973/e23/> mads:componentList ?l .
        ?l rdf:rest* ?sl .
        ?sl rdf:first ?e1 .
        ?sl rdf:rest ?sln .
        ?sln rdf:first ?e2 .

        ?e1 skos:notation ?c1_notation .
        ?e2 skos:notation ?c2_notation .

        OPTIONAL {
            ?e1 skos:prefLabel ?c1_label .
        }
        OPTIONAL {
            ?e2 skos:prefLabel ?c2_label .
        }
}}
c1_notation c1_label c2_notation c2_label
“001.3” “Humaniora”@nb “T1–09” “Historie, geografisk behandling, biografier”@nb
“T1–09” “Historie, geografisk behandling, biografier”@nb “T2–73” “USA”@nb

Additional processing for data from WebDewey

The script is supposed to work with any MARC21 classification data, but also supports the non-standard ess codes supplied in WebDewey data to differentiate between different types of notes.

MARC21XML RDF
680 having $9 ess=ndf Definition note skos:definition
680 having $9 ess=nvn Variant name note wd:variantName for each subfield $t
680 having $9 ess=nch Class here note wd:classHere for each subfield $t
680 having $9 ess=nin Including note wd:including for each subfield $t
680 having $9 ess=nph Former heading wd:formerHeading for each subfield $t
685 having $9 ess=ndn Deprecation note owl:deprecated true
694 having $9 ess=nml ??? SKOS.editorialNote

Notes that are currently not treated in any special way:

  • 253 having $9 ess=nsx Do-not-use.
  • 253 having $9 ess=nce Class-elsewhere
  • 253 having $9 ess=ncw Class-elsewhere-manual
  • 253 having $9 ess=nse See.
  • 253 having $9 ess=nsw See-manual.
  • 353 having $9 ess=nsa See-also
  • 683 having $9 ess=nbu Preference note
  • 683 having $9 ess=nop Options note
  • 683 having $9 ess=non Options note
  • 684 having $9 ess=nsm Manual note
  • 685 having $9 ess=ndp Discontinued partial
  • 685 having $9 ess=nrp Relocation
  • 689 having $9 ess=nru Sist brukt i…
Release History

Release History

0.3.1

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
mc2skos-0.3.1-py2.6.egg (19.8 kB) Copy SHA256 Checksum SHA256 2.6 Egg Aug 15, 2016
mc2skos-0.3.1-py2.7.egg (19.7 kB) Copy SHA256 Checksum SHA256 2.7 Egg Aug 15, 2016
mc2skos-0.3.1-py3.3.egg (20.2 kB) Copy SHA256 Checksum SHA256 3.3 Egg Aug 15, 2016
mc2skos-0.3.1-py3.4.egg (19.9 kB) Copy SHA256 Checksum SHA256 3.4 Egg Aug 15, 2016
mc2skos-0.3.1-py3.5.egg (19.8 kB) Copy SHA256 Checksum SHA256 3.5 Egg Aug 15, 2016
mc2skos-0.3.1-py3.6.egg (19.6 kB) Copy SHA256 Checksum SHA256 3.6 Egg Aug 15, 2016
mc2skos-0.3.1.tar.gz (11.6 kB) Copy SHA256 Checksum SHA256 Source Aug 15, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting