Python port of MARC::Lint with tests

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

coliin8

These details have not been verified by PyPI

Project description

marc-lint

Python port of the Perl MARC::Lint module for validating MARC21 bibliographic records.

Now added new functionality including added leader and control validation and improved article validation based on language.

Features

✅ Comprehensive MARC21 validation
✅ Leader and control field (008) validation
✅ ISBN/ISSN validation using industry-standard algorithms
✅ Language and geographic code validation
✅ Article/non-filing indicator validation
✅ Batch processing with per-record identification
✅ Command-line tool with JSON output support
✅ Python 3.10+ support
✅ Type hints and comprehensive test coverage

Installation

pip install marc-lint

Or with uv:

uv pip install marc-lint

Quick Start

Command Line Interface

Lint a MARC file and display warnings:

marc-lint records.mrc

Example output:

--- Record ocm12345678 ---
  020: Subfield a has bad checksum, 123456789X.
  245: Must end with . (period).

--- Record ocm87654321 ---
  022: Subfield a has bad checksum, 1234-5678.

============================================================
Processed 2 record(s)
Found 3 warning(s) in 2 record(s)

CLI Options

# Output as JSON (useful for automation)
marc-lint records.mrc --format json

# Quiet mode (warnings only, no summary)
marc-lint records.mrc --quiet

# Use record index as ID when 001 field is missing
marc-lint records.mrc --use-index

# Combine options
marc-lint records.mrc -f json -q

JSON Output Format

marc-lint records.mrc --format json

[
  {
    "record_id": "ocm12345678",
    "is_valid": false,
    "warnings": [
      {
        "field": "020",
        "message": "has bad checksum, 123456789X.",
        "subfield": "a",
        "position": null,
        "record_id": "ocm12345678"
      }
    ]
  },
  {
    "record_id": "ocm87654321",
    "is_valid": true,
    "warnings": []
  }
]

Exit Codes

Code	Meaning
0	No warnings found
1	Warnings found
2	Error reading file

Use in CI/CD pipelines:

marc-lint catalog.mrc && echo "All records valid!"

Python Library - Single Record

from marc_lint import MarcLint
from pymarc import MARCReader

# Create a linter instance
linter = MarcLint()

# Process MARC records one at a time
with open('records.mrc', 'rb') as fh:
    reader = MARCReader(fh)
    for record in reader:
        warnings = linter.check_record(record)
        
        if warnings:
            print(f"Record has {len(warnings)} warnings:")
            for warning in warnings:
                print(f"  - {warning}")

Python Library - Batch Processing

For processing multiple records with per-record identification:

from marc_lint import MarcLint
from pymarc import MARCReader

linter = MarcLint()

with open('records.mrc', 'rb') as fh:
    reader = MARCReader(fh)
    records = list(reader)

# Process all records at once
results = linter.check_records(records, use_index_as_id=True)

for result in results:
    if not result.is_valid:
        print(f"Record {result.record_id}:")
        for warning in result.warnings:
            print(f"  - {warning}")

# Summary statistics
total_warnings = sum(len(r.warnings) for r in results)
invalid_records = sum(1 for r in results if not r.is_valid)
print(f"Found {total_warnings} warnings in {invalid_records} of {len(results)} records")

Python Library - Structured Warnings

For automation and API integration, use structured warnings:

from marc_lint import MarcLint
from pymarc import MARCReader

linter = MarcLint()

with open('records.mrc', 'rb') as fh:
    reader = MARCReader(fh)
    for record in reader:
        linter.check_record(record)
        
        # Get structured warning objects
        for warning in linter.warnings_structured():
            print(f"Record: {warning.record_id}")
            print(f"Field: {warning.field}")
            print(f"Message: {warning.message}")
            if warning.subfield:
                print(f"Subfield: {warning.subfield}")
            if warning.position is not None:
                print(f"Position: {warning.position + 1}")

Python Library - JSON Output

Export warnings as JSON for APIs:

import json
from marc_lint import MarcLint
from pymarc import MARCReader

linter = MarcLint()

with open('records.mrc', 'rb') as fh:
    reader = MARCReader(fh)
    records = list(reader)

results = linter.check_records(records)

# Convert to JSON
output = [
    {
        "record_id": r.record_id,
        "is_valid": r.is_valid,
        "warnings": [w.to_dict() for w in r.warnings]
    }
    for r in results
]
print(json.dumps(output, indent=2))

Example JSON output:

[
  {
    "field": "020",
    "message": "has bad checksum, 123456789X.",
    "subfield": "a",
    "position": null,
    "record_id": "ocm12345678"
  }
]

Python Library - Filtering Warnings

# Filter warnings by field
isbn_warnings = [
    w for w in linter.warnings_structured() 
    if w.field == "020"
]

# Filter by subfield
subfield_a_warnings = [
    w for w in linter.warnings_structured() 
    if w.subfield == "a"
]

# Filter by record
results = linter.check_records(records)
for result in results:
    leader_warnings = [w for w in result.warnings if w.field == "LDR"]

See STRUCTURED_WARNINGS.md for detailed documentation on structured warnings.

Validation Rules

Leader Validation

Validates leader positions:

Position 05: Record status
Position 06: Type of record
Position 07: Bibliographic level
Position 08: Type of control
Position 09: Character coding scheme
Position 17: Encoding level
Position 18: Descriptive cataloging form
Position 19: Multipart resource record level

Control Field 008 Validation

Validates:

Field length (40 characters)
Type of date (position 06)
Date 1 and Date 2 (positions 07-14)
Country code (positions 15-17)
Language code (positions 35-37)
Modified record indicator (position 38)
Cataloging source (position 39)

Supported Fields

020: ISBN validation with checksum verification
022: ISSN validation with checksum verification
041: Language codes (ISO 639-2)
043: Geographic area codes (MARC Geographic Areas)
130, 240, 630, 730, 830: Non-filing indicator validation
245: Comprehensive title validation (punctuation, indicators, subfield order)
880: Alternate graphic representation validation
Plus general field/subfield/indicator validation for all tags

Architecture

Files

marc_lint.linter.py – main MarcLint class, equivalent and extension to MARC::Lint.
marc_lint.code_data.py – translation of the code tables from MARC::Lint::CodeData (language, geographic area, country codes).
marc_lint.field_rules.py – equivalent to __DATA__ and _read_rules in MARC::Lint.
marc_lint.warning.py – MarcWarning class for structured warnings.
marc_lint.cli.py – command-line interface.

Key behavior mirrored from the Perl module:

Per-record checks in MarcLint.check_record: leader validation, 1XX-count, required 245, field repeatability, indicator validity, subfield legality and repeatability, and control-field rules.
Tag-specific methods check_020, check_022, check_041, check_043, check_245, plus _check_article for handling non-filing indicators.
Rules for tags are built in _read_rules from RULES_DATA, which is a equivalent to __DATA__ table in Lint.pm.

Development

Setup

# Clone the repository
git clone https://github.com/coliin8/marclint.git
cd marclint

# Install dependencies (including dev dependencies)
uv sync --all-groups

# Activate the virtual environment (optional, uv run handles this automatically)
source .venv/bin/activate

Common Development Tasks

# Run tests
uv run pytest

# Run tests with coverage
uv run pytest --cov=src/marc_lint --cov-report=term-missing

# Run tests across all Python versions (3.10-3.14)
uv run tox

# Run linting
uv run tox -e lint

# Auto-format code
uv run tox -e format

# Run specific test file
uv run pytest tests/test_warning.py -v

# Run specific test
uv run pytest tests/test_warning.py::test_warning_basic_creation -v

# Run bump version of project
uv run bump-my-version bump patch

Code Quality

This project uses:

pytest for testing
pytest-cov for coverage reporting
ruff for linting and formatting (via tox)
tox for testing across multiple Python versions

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for details.

License

MIT License - see LICENSE file for details.

This is a Python port of the original Perl MARC::Lint module.

Acknowledgments

Bryan Baldus, Ed Summers, and Dan Lester for the original Perl MARC::Lint module (2001-2011)
Library of Congress for MARC21 standards and documentation

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

coliin8

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.5

Jan 18, 2026

This version

0.0.4

Jan 18, 2026

0.0.3

Jan 18, 2026

0.0.2

Dec 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marc_lint-0.0.4.tar.gz (41.0 kB view details)

Uploaded Jan 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

marc_lint-0.0.4-py3-none-any.whl (42.5 kB view details)

Uploaded Jan 18, 2026 Python 3

File details

Details for the file marc_lint-0.0.4.tar.gz.

File metadata

Download URL: marc_lint-0.0.4.tar.gz
Upload date: Jan 18, 2026
Size: 41.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for marc_lint-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`6232b4ada6ee19aee150afdb48cf9207c408da3ab4e713fba9124c58b05e27f7`
MD5	`065a73925ac0a33ea0e1f967c842359e`
BLAKE2b-256	`8af3b8d635a96c36b459bd79d0a5d8fa9324cbded7b17c5eb759f765af6e148d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for marc_lint-0.0.4.tar.gz:

Publisher: publish.yml on coliin8/marclint

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: marc_lint-0.0.4.tar.gz
- Subject digest: 6232b4ada6ee19aee150afdb48cf9207c408da3ab4e713fba9124c58b05e27f7
- Sigstore transparency entry: 833771033
- Sigstore integration time: Jan 18, 2026
Source repository:
- Permalink: coliin8/marclint@091b385e801d59b20c0e0e0a9551613421cb66de
- Branch / Tag: refs/tags/v0.0.4
- Owner: https://github.com/coliin8
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@091b385e801d59b20c0e0e0a9551613421cb66de
- Trigger Event: release

File details

Details for the file marc_lint-0.0.4-py3-none-any.whl.

File metadata

Download URL: marc_lint-0.0.4-py3-none-any.whl
Upload date: Jan 18, 2026
Size: 42.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for marc_lint-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9c86076b638c780540f61a08c0c116191497d10068219522de67f481bd02e49f`
MD5	`5b744bbeb266709db377973eb376157b`
BLAKE2b-256	`d1aecc1b89814d700eaef24fc2c482b7b776ef1423b921b5450178f51b93fdfc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for marc_lint-0.0.4-py3-none-any.whl:

Publisher: publish.yml on coliin8/marclint

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: marc_lint-0.0.4-py3-none-any.whl
- Subject digest: 9c86076b638c780540f61a08c0c116191497d10068219522de67f481bd02e49f
- Sigstore transparency entry: 833771035
- Sigstore integration time: Jan 18, 2026
Source repository:
- Permalink: coliin8/marclint@091b385e801d59b20c0e0e0a9551613421cb66de
- Branch / Tag: refs/tags/v0.0.4
- Owner: https://github.com/coliin8
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@091b385e801d59b20c0e0e0a9551613421cb66de
- Trigger Event: release

marc-lint 0.0.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

marc-lint

Features

Installation

Quick Start

Command Line Interface

CLI Options

JSON Output Format

Exit Codes

Python Library - Single Record

Python Library - Batch Processing

Python Library - Structured Warnings

Python Library - JSON Output

Python Library - Filtering Warnings

Validation Rules

Leader Validation

Control Field 008 Validation

Supported Fields

Architecture

Files

Development

Setup

Common Development Tasks

Code Quality

Contributing

License

Acknowledgments

Links

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance