Python port of MARC::Lint with tests
Project description
marc-lint
Python port of the Perl MARC::Lint module for validating MARC21 bibliographic records.
Features
- ✅ Comprehensive MARC21 validation
- ✅ ISBN/ISSN validation using industry-standard algorithms
- ✅ Language and geographic code validation
- ✅ Article/non-filing indicator validation
- ✅ Command-line tool for batch processing
- ✅ Python 3.10+ support
- ✅ Type hints and comprehensive test coverage
Installation
pip install marc-lint
Or with uv:
uv pip install marc-lint
Quick Start
Command Line Interface
Lint a MARC file and display warnings:
marc-lint records.mrc
Example output:
--- Record 1 ---
020: Subfield a has bad checksum, 123456789X.
245: Must end with . (period).
--- Record 2 ---
022: Subfield a has bad checksum, 1234-5678.
============================================================
Processed 2 record(s)
Found 3 warning(s)
The CLI exits with status code 0 if no warnings are found, or 1 if warnings are present (useful for CI/CD).
# Use in CI/CD pipelines
marc-lint catalog.mrc && echo "All records valid!"
Python Library - Basic Usage
from marc_lint import MarcLint
from pymarc import MARCReader
# Create a linter instance
linter = MarcLint()
# Process MARC records
with open('records.mrc', 'rb') as fh:
reader = MARCReader(fh)
for record in reader:
linter.check_record(record)
# Get warnings as strings (backward compatible)
if linter.warnings():
print(f"Record has {len(linter.warnings())} warnings:")
for warning in linter.warnings():
print(f" - {warning}")
Python Library - Structured Warnings
For automation and API integration, use structured warnings:
from marc_lint import MarcLint
from pymarc import MARCReader
linter = MarcLint()
with open('records.mrc', 'rb') as fh:
reader = MARCReader(fh)
for record in reader:
linter.check_record(record)
# Get structured warning objects
for warning in linter.warnings_structured():
print(f"Field: {warning.field}")
print(f"Message: {warning.message}")
if warning.subfield:
print(f"Subfield: {warning.subfield}")
if warning.position is not None:
print(f"Position: {warning.position + 1}")
Python Library - JSON Output
Export warnings as JSON for APIs:
import json
from marc_lint import MarcLint
from pymarc import MARCReader
linter = MarcLint()
with open('records.mrc', 'rb') as fh:
reader = MARCReader(fh)
for record in reader:
linter.check_record(record)
# Convert to JSON
warnings_data = [w.to_dict() for w in linter.warnings_structured()]
print(json.dumps(warnings_data, indent=2))
Example JSON output:
[
{
"field": "020",
"message": "has bad checksum, 123456789X.",
"subfield": "a",
"position": null
},
{
"field": "245",
"message": "Must end with . (period).",
"subfield": null,
"position": null
}
]
Python Library - Filtering Warnings
# Filter warnings by field
isbn_warnings = [
w for w in linter.warnings_structured()
if w.field == "020"
]
# Filter by subfield
subfield_a_warnings = [
w for w in linter.warnings_structured()
if w.subfield == "a"
]
# Group by field
from collections import defaultdict
warnings_by_field = defaultdict(list)
for warning in linter.warnings_structured():
warnings_by_field[warning.field].append(warning)
See STRUCTURED_WARNINGS.md for detailed documentation on structured warnings.
Validation Rules
Supported Fields
- 020: ISBN validation with checksum verification
- 022: ISSN validation with checksum verification
- 041: Language codes (ISO 639-2)
- 043: Geographic area codes (MARC Geographic Areas)
- 130, 240, 630, 730, 830: Non-filing indicator validation
- 245: Comprehensive title validation (punctuation, indicators, subfield order)
- 880: Alternate graphic representation validation
- Plus general field/subfield/indicator validation for all tags
Architecture
Files
marc_lint.linter.py– mainMarcLintclass, equivalent and extension toMARC::Lint.marc_lint.code_data.py– translation of the code tables fromMARC::Lint::CodeData(language, geographic area, country codes).marc_lint.field_rules.py– equivalent to__DATA__and_read_rulesinMARC::Lint.
Key behavior mirrored from the Perl module:
- Per-record checks in
MarcLint.check_record: 1XX-count, required 245, field repeatability, indicator validity, subfield legality and repeatability, and control-field rules. - Tag-specific methods
check_020,check_041,check_043,check_245, plus_check_articlefor handling non-filing indicators. - Add Tag-specific methods
check_022and will add others in the future. - Rules for tags are built in
_read_rulesfromRULES_DATA, which is a equivalent to__DATA__table inLint.pm.
Development
Setup
# Clone the repository
git clone https://github.com/coliin8/marclint.git
cd marclint
# Install dependencies (including dev dependencies)
uv sync --all-groups
# Activate the virtual environment (optional, uv run handles this automatically)
source .venv/bin/activate
Common Development Tasks
# Run tests
uv run pytest
# Run tests with coverage
uv run pytest --cov=src/marc_lint --cov-report=term-missing
# Run tests across all Python versions (3.10-3.14)
uv run tox
# Run linting
uv run tox -e lint
# Auto-format code
uv run tox -e format
# Run specific test file
uv run pytest tests/test_warning.py -v
# Run specific test
uv run pytest tests/test_warning.py::test_warning_basic_creation -v
# Run bump version of project
uv run bump-my-version bump patch
Code Quality
This project uses:
- pytest for testing
- pytest-cov for coverage reporting
- ruff for linting and formatting (via tox)
- tox for testing across multiple Python versions
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for details.
License
MIT License - see LICENSE file for details.
This is a Python port of the original Perl MARC::Lint module.
Acknowledgments
- Bryan Baldus, Ed Summers, and Dan Lester for the original Perl MARC::Lint module (2001-2011)
- Library of Congress for MARC21 standards and documentation
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file marc_lint-0.0.2.tar.gz.
File metadata
- Download URL: marc_lint-0.0.2.tar.gz
- Upload date:
- Size: 34.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9a4a0adb2d48f1a200d10dd6a063f8624dedf58261db7e73434fafd339e0b5f
|
|
| MD5 |
9acfde35ac49d3cc8474acfc2c2fd6f4
|
|
| BLAKE2b-256 |
3ac12aa6f563587b22a6714fc694c467f2444e851478051ac754f30e89b543fa
|
Provenance
The following attestation bundles were made for marc_lint-0.0.2.tar.gz:
Publisher:
publish.yml on coliin8/marclint
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
marc_lint-0.0.2.tar.gz -
Subject digest:
b9a4a0adb2d48f1a200d10dd6a063f8624dedf58261db7e73434fafd339e0b5f - Sigstore transparency entry: 782251197
- Sigstore integration time:
-
Permalink:
coliin8/marclint@9ed83c1ce4097e11a44b2fa7728e550a6c5ba42e -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/coliin8
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9ed83c1ce4097e11a44b2fa7728e550a6c5ba42e -
Trigger Event:
release
-
Statement type:
File details
Details for the file marc_lint-0.0.2-py3-none-any.whl.
File metadata
- Download URL: marc_lint-0.0.2-py3-none-any.whl
- Upload date:
- Size: 35.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de2096848983c8deb410cb949d82af7556ebe49968204f64278e761ab715903c
|
|
| MD5 |
2fed5d37f4c06e92935e35c99bc420b9
|
|
| BLAKE2b-256 |
71134fff86f32297f3867f1c25363ca813b6e689cf977b9784d0ea8feaa0436a
|
Provenance
The following attestation bundles were made for marc_lint-0.0.2-py3-none-any.whl:
Publisher:
publish.yml on coliin8/marclint
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
marc_lint-0.0.2-py3-none-any.whl -
Subject digest:
de2096848983c8deb410cb949d82af7556ebe49968204f64278e761ab715903c - Sigstore transparency entry: 782251199
- Sigstore integration time:
-
Permalink:
coliin8/marclint@9ed83c1ce4097e11a44b2fa7728e550a6c5ba42e -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/coliin8
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9ed83c1ce4097e11a44b2fa7728e550a6c5ba42e -
Trigger Event:
release
-
Statement type: