Skip to main content

A data validation tool for MARC records

Project description

pydantic-marc

pydantic-marc is a library for validating data against the MARC21 Format for Bibliographic Data.

Installation

Use pip:

$ pip install pydantic-marc

Features

pydantic-marc uses pydantic, the popular data validation library, to define the valid components of a MARC record. The package expects users will employ pymarc to read MARC records from binary files.

Basic usage:

Validating a MARC record:

from pymarc import MARCReader
from rich import print

from pydantic_marc import MarcRecord


with open("temp/valid.mrc", "rb") as fh:
    reader = MARCReader(fh)
    for record in reader:
        print(record)
        model = MarcRecord.model_validate(record, from_attributes=True)
        print(model.model_dump())
{
    "leader": "00536nam a22001985i 4500",
    "fields": [
        {"001": "123456789"},
        {"008": "201201s2020    nyua          000 1 eng d"},
        {"035": {"ind1": " ", "ind2": " ", "subfields": [{"a": "(OCoLC)1234567890"}]}},
        {"049": {"ind1": " ", "ind2": " ", "subfields": [{"a": "NYPP"}]}},
        {
            "245": {
                "ind1": "0",
                "ind2": "0",
                "subfields": [
                    {"a": "Fake :"},
                    {"b": "Marc Record"},
                ]
            }
        },
        {
            "264": {
                "ind1": " ",
                "ind2": "1",
                "subfields": [
                    {"a": "New York :"},
                    {"b": "NY,"},
                    {"c": "[2020]"}
                ]
            }
        },
        {
            "300": {
                "ind1": " ",
                "ind2": " ",
                "subfields": [{"a": "100 pages :"}, {"b": "color illustrations;"}, {"c": "30 cm"}]
            }
        },
        {"336": {"ind1": " ", "ind2": " ", "subfields": [{"a": "text"}, {"b": "txt"}, {"2": "rdacontent"}]}},
        {"337": {"ind1": " ", "ind2": " ", "subfields": [{"a": "unmediated"}, {"b": "n"}, {"2": "rdamedia"}]}},
        {"338": {"ind1": " ", "ind2": " ", "subfields": [{"a": "volume"}, {"b": "nc"}, {"2": "rdacarrier"}]}}
        
    ]
}

If the record is invalid the errors can be returned as json, a dictionary, or in a human-readable format.

JSON Error Message:

from pydantic import ValidationError
from pymarc import MARCReader

from pydantic_marc import MarcRecord

with open("temp/invalid.mrc", "rb") as fh:
    reader = MARCReader(fh)
    for record in reader:
        print(record)
        try:
            MarcRecord.model_validate(record)
        except ValidationError as e:
            # errors as a dictionary
            print(e.errors())

            # errors as json
            print(e.json())
[
    {
        "type": "non_repeatable_field",
        "loc": ("fields", "001"),
        "msg": "001: Has been marked as a non-repeating field.",
        "input": "001",
        "ctx": {"input": "001"}
    },
    {
        "type": "missing_required_field",
        "loc": ("fields", "245"),
        "msg": "One 245 field must be present in a MARC21 record.",
        "input": "245",
        "ctx": {"input": "245"}
    },
    {
        "type": "multiple_1xx_fields",
        "loc": ("fields", "100", "110"),
        "msg": "1XX: Only one 1XX tag is allowed. Record contains: ['100', '110']",
        "input": ["100", "110"],
        "ctx": {"input": ["100", "110"]}
    },
    {
        "type": "control_field_length_invalid",
        "loc": ("fields", "006"),
        "msg": "006: Length appears to be invalid. Reported length is: 6. Expected length is: 18",
        "input": "p    |",
        "ctx": {"tag": "006", "valid": 18, "input": "p    |", "length": 6}
    },
    {
        "type": "invalid_indicator",
        "loc": ("fields", "035", "ind1"),
        "msg": "035 ind1: Invalid data (0). Indicator should be ['', ' '].",
        "input": "0",
        "ctx": {"loc": ("035", "ind1"), "input": "0", "valid": ["", " "], "tag": "035", "ind": "ind1"}
    },
    {
        "type": "non_repeatable_subfield",
        "loc": ("fields", "600", "a"),
        "msg": "600 $a: Subfield cannot repeat.",
        "input": [PydanticSubfield(code="a", value="Foo"), PydanticSubfield(code="a", value="Foo,")],
        "ctx": {
            "loc": ("600", "a"),
            "input": [PydanticSubfield(code="a", value="Foo"), PydanticSubfield(code="a", value="Foo,")],
            "tag": "600",
            "code": "a"
        }
    }
]

Human-readable Error Message:

from pydantic import ValidationError
from pymarc import MARCReader

from pydantic_marc import MarcRecord

with open("temp/invalid.mrc", "rb") as fh:
    reader = MARCReader(fh)
    for record in reader:
        print(record)
        try:
            MarcRecord.model_validate(record)
        except ValidationError as e:
            # errors in a human-readable format
            print(e.errors())
6 validation errors for MarcRecord
fields.001
  001: Has been marked as a non-repeating field. [type=non_repeatable_field, input_value='001', input_type=str]
fields.245
  One 245 field must be present in a MARC21 record. [type=missing_required_field, input_value='245', input_type=str]
fields.100.110
  1XX: Only one 1XX tag is allowed. Record contains: ['100', '110'] [type=multiple_1xx_fields, input_value=['100', '110'], input_type=list]
fields.006
  006: Length appears to be invalid. Reported length is: 6. Expected length is: 18 [type=control_field_length_invalid, input_value='p    |', input_type=str]
fields.035.ind1
  035 ind1: Invalid data (0). Indicator should be ['', ' ']. [type=invalid_indicator, input_value='0', input_type=str]
fields.600.a
  600 $a: Subfield cannot repeat. [type=non_repeatable_subfield, input_value=[PydanticSubfield(code='a...code='a', value='Foo,')], input_type=list]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydantic_marc-0.1.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydantic_marc-0.1.0-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file pydantic_marc-0.1.0.tar.gz.

File metadata

  • Download URL: pydantic_marc-0.1.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.2 Darwin/23.6.0

File hashes

Hashes for pydantic_marc-0.1.0.tar.gz
Algorithm Hash digest
SHA256 70370caa75dfdbca4783c9521bfc075e92e04fd21e51d149ff49928c3803c6f9
MD5 44cfc52029c111c38235cb0b429c1417
BLAKE2b-256 72bea8583279fe3cdc2830de082872e348fd4b6b603c64b7df14d6fb6a3487e8

See more details on using hashes here.

File details

Details for the file pydantic_marc-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pydantic_marc-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.2 Darwin/23.6.0

File hashes

Hashes for pydantic_marc-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 49df39a74a0a6213eca705624a165298018dc378de185f6aa28f1e1ce84c2c56
MD5 5fe4b85b4f6d506cd62de50a15b2cfe6
BLAKE2b-256 e68b3e5d5337c2b54d4fd37dc7cdf60764b8e949f4c427a99ecda8d5fc702041

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page