A data validation tool for MARC records
Project description
pydantic-marc
pydantic-marc is a library for validating data against the MARC21 Format for Bibliographic Data.
Installation
Use pip:
$ pip install pydantic-marc
Features
pydantic-marc uses pydantic, the popular data validation library, to define the valid components of a MARC record. The package expects users will employ pymarc to read MARC records from binary files.
Basic usage:
Validating a MARC record:
from pymarc import MARCReader
from rich import print
from pydantic_marc import MarcRecord
with open("temp/valid.mrc", "rb") as fh:
reader = MARCReader(fh)
for record in reader:
print(record)
model = MarcRecord.model_validate(record, from_attributes=True)
print(model.model_dump())
{
"leader": "00536nam a22001985i 4500",
"fields": [
{"001": "123456789"},
{"008": "201201s2020 nyua 000 1 eng d"},
{"035": {"ind1": " ", "ind2": " ", "subfields": [{"a": "(OCoLC)1234567890"}]}},
{"049": {"ind1": " ", "ind2": " ", "subfields": [{"a": "NYPP"}]}},
{
"245": {
"ind1": "0",
"ind2": "0",
"subfields": [
{"a": "Fake :"},
{"b": "Marc Record"},
]
}
},
{
"264": {
"ind1": " ",
"ind2": "1",
"subfields": [
{"a": "New York :"},
{"b": "NY,"},
{"c": "[2020]"}
]
}
},
{
"300": {
"ind1": " ",
"ind2": " ",
"subfields": [{"a": "100 pages :"}, {"b": "color illustrations;"}, {"c": "30 cm"}]
}
},
{"336": {"ind1": " ", "ind2": " ", "subfields": [{"a": "text"}, {"b": "txt"}, {"2": "rdacontent"}]}},
{"337": {"ind1": " ", "ind2": " ", "subfields": [{"a": "unmediated"}, {"b": "n"}, {"2": "rdamedia"}]}},
{"338": {"ind1": " ", "ind2": " ", "subfields": [{"a": "volume"}, {"b": "nc"}, {"2": "rdacarrier"}]}}
]
}
If the record is invalid the errors can be returned as json, a dictionary, or in a human-readable format.
JSON Error Message:
from pydantic import ValidationError
from pymarc import MARCReader
from pydantic_marc import MarcRecord
with open("temp/invalid.mrc", "rb") as fh:
reader = MARCReader(fh)
for record in reader:
print(record)
try:
MarcRecord.model_validate(record)
except ValidationError as e:
# errors as a dictionary
print(e.errors())
# errors as json
print(e.json())
[
{
"type": "non_repeatable_field",
"loc": ("fields", "001"),
"msg": "001: Has been marked as a non-repeating field.",
"input": "001",
"ctx": {"input": "001"}
},
{
"type": "missing_required_field",
"loc": ("fields", "245"),
"msg": "One 245 field must be present in a MARC21 record.",
"input": "245",
"ctx": {"input": "245"}
},
{
"type": "multiple_1xx_fields",
"loc": ("fields", "100", "110"),
"msg": "1XX: Only one 1XX tag is allowed. Record contains: ['100', '110']",
"input": ["100", "110"],
"ctx": {"input": ["100", "110"]}
},
{
"type": "control_field_length_invalid",
"loc": ("fields", "006"),
"msg": "006: Length appears to be invalid. Reported length is: 6. Expected length is: 18",
"input": "p |",
"ctx": {"tag": "006", "valid": 18, "input": "p |", "length": 6}
},
{
"type": "invalid_indicator",
"loc": ("fields", "035", "ind1"),
"msg": "035 ind1: Invalid data (0). Indicator should be ['', ' '].",
"input": "0",
"ctx": {"loc": ("035", "ind1"), "input": "0", "valid": ["", " "], "tag": "035", "ind": "ind1"}
},
{
"type": "non_repeatable_subfield",
"loc": ("fields", "600", "a"),
"msg": "600 $a: Subfield cannot repeat.",
"input": [PydanticSubfield(code="a", value="Foo"), PydanticSubfield(code="a", value="Foo,")],
"ctx": {
"loc": ("600", "a"),
"input": [PydanticSubfield(code="a", value="Foo"), PydanticSubfield(code="a", value="Foo,")],
"tag": "600",
"code": "a"
}
}
]
Human-readable Error Message:
from pydantic import ValidationError
from pymarc import MARCReader
from pydantic_marc import MarcRecord
with open("temp/invalid.mrc", "rb") as fh:
reader = MARCReader(fh)
for record in reader:
print(record)
try:
MarcRecord.model_validate(record)
except ValidationError as e:
# errors in a human-readable format
print(e.errors())
6 validation errors for MarcRecord
fields.001
001: Has been marked as a non-repeating field. [type=non_repeatable_field, input_value='001', input_type=str]
fields.245
One 245 field must be present in a MARC21 record. [type=missing_required_field, input_value='245', input_type=str]
fields.100.110
1XX: Only one 1XX tag is allowed. Record contains: ['100', '110'] [type=multiple_1xx_fields, input_value=['100', '110'], input_type=list]
fields.006
006: Length appears to be invalid. Reported length is: 6. Expected length is: 18 [type=control_field_length_invalid, input_value='p |', input_type=str]
fields.035.ind1
035 ind1: Invalid data (0). Indicator should be ['', ' ']. [type=invalid_indicator, input_value='0', input_type=str]
fields.600.a
600 $a: Subfield cannot repeat. [type=non_repeatable_subfield, input_value=[PydanticSubfield(code='a...code='a', value='Foo,')], input_type=list]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydantic_marc-0.1.0.tar.gz.
File metadata
- Download URL: pydantic_marc-0.1.0.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.12.2 Darwin/23.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70370caa75dfdbca4783c9521bfc075e92e04fd21e51d149ff49928c3803c6f9
|
|
| MD5 |
44cfc52029c111c38235cb0b429c1417
|
|
| BLAKE2b-256 |
72bea8583279fe3cdc2830de082872e348fd4b6b603c64b7df14d6fb6a3487e8
|
File details
Details for the file pydantic_marc-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pydantic_marc-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.12.2 Darwin/23.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49df39a74a0a6213eca705624a165298018dc378de185f6aa28f1e1ce84c2c56
|
|
| MD5 |
5fe4b85b4f6d506cd62de50a15b2cfe6
|
|
| BLAKE2b-256 |
e68b3e5d5337c2b54d4fd37dc7cdf60764b8e949f4c427a99ecda8d5fc702041
|