Skip to main content

CSV reader and writer for MARC records - an extension for pymarc

Project description

pymarc_csv

CSV reader and writer for MARC records - an extension for pymarc. This can be useful where there's any value in making MARC records editable as a spreadsheet and for manipulating records with tools like Pandas. I admit, however, that the CSV serlialization implemented here, though far more readable than MARC21 itself, is still a bit of an eyesore.

Note that for processing MARC records as CSV or Parquet files there's also marctable. The main advantage of pymarc_csv is its integration with pymarc.

Overview

pymarc-csv extends the pymarc library to provide CSV reading and writing capabilities for MARC21 bibliographic records. This allows you to work with MARC data in a more accessible CSV format while maintaining full compatibility with pymarc's Record objects.

Features

  • CSVReader: Read MARC records from CSV files
  • CSVWriter: Write MARC records to CSV format
  • CSV serialization: Convert Record objects to/from CSV strings
  • Duplicate field handling: Automatically handles repeated MARC fields (e.g., multiple 650 fields become 650, 650_2, 650_3)
  • Field order preservation: Maintains original field order through a field_order column
  • Full pymarc compatibility: Works with existing pymarc Record objects

Installation

pip install pymarc-csv

Requirements

  • Python >= 3.10
  • pymarc >= 5.3.1

Quick Start

Reading CSV files

This is closely analogous to reading JSON and XML records in pymarc.

from pymarc_csv import CSVReader

# Read MARC records from CSV
with open('records.csv', 'r') as fh:
    reader = CSVReader(fh)
    for record in reader:
        print(record.title)
        print(record['245']['a'])

Writing CSV files

This is where things get a bit more complicated as compared to other file formats in pymarc. In general, the main difference is that all Record objects to be written should be collected as a list first.

from pymarc_csv import CSVWriter


writer = CSVWriter(open('output.csv','wt'))
writer = CSVWriter(fh)
writer.write([record1, record2, record3])  # Write multiple at once
writer.close()

If you then wanted to add further records without introducing any new CSV headings (so no new fields or unseen duplicate fields), then before calling writer.close():

record = Record()
record.add_field(
    Field(
        tag='245',
        indicators=Indicators('1', '0'),
        subfields=[
            Subfield(code='a', value='Python Programming'),
            Subfield(code='c', value='Guido van Rossum')
        ]
    )
)

# Write to CSV
writer.write(record)
writer.close()

To avoid having to store a large list of Records first, you could also use the add_tags method and then write records one by one using write_one. This is rather cumbersome, however, so you might be better off just using marctable at that point.

Converting records to/from CSV strings

from pymarc_csv import as_csv, parse_csv_to_dict

# Record to CSV string
csv_string = as_csv(record)

# CSV string back to dict
record_dict = parse_csv_to_dict(csv_string)

CSV Format

The CSV format used by pymarc-csv has the following structure:

  • LDR column: Contains the record leader
  • Field columns: One column per MARC field (e.g., 001, 245, 650)
  • Duplicate fields: Numbered with suffixes (e.g., 650, 650_2, 650_3)
  • field_order column: Preserves the original order of fields

Example CSV output (showing one MARC record as a table for readability):

Field Value
001 fol05731351
003 IMchF
005 20000613133448.0
008 000107s2000 nyua 001 0 eng
010 \$a 00020737
020 \$a0471383147 (paper/cd-rom : alk. paper)
040 \$aDLC$cDLC$dDLC
042 \$apcc
050 00$aQA76.73.P22$bM33 2000
082 00$a005.13/3$221
100 1\$aMartinsson, Tobias,$d1976-
245 10$aActivePerl with ASP and ADO /$cTobias Martinsson.
260 \$aNew York :$bJohn Wiley & Sons,$c2000.
300 \$axxi, 289 p. :$bill. ;$c23 cm. +$e1 computer laser disc (4 3/4 in.)
500 \$a"Wiley Computer Publishing."
630 00$aActive server pages.
630_2 00$aActiveX.
650 \0$aPerl (Computer program language)
LDR 00755cam 22002414a 4500
field_order 001 003 005 008 010 020 040 042 050 082 100 245 260 300 500 630 630_2 650

An un-prettified version of this can be found in test/one.csv.

Development

Running Tests

python -m unittest

License

BSD 2-Clause License (same as pymarc)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Credits

Built as an extension to the pymarc library maintained by Ed Summers, Andrew Hankinson and contributors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymarc_csv-0.1.1.tar.gz (64.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymarc_csv-0.1.1-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file pymarc_csv-0.1.1.tar.gz.

File metadata

  • Download URL: pymarc_csv-0.1.1.tar.gz
  • Upload date:
  • Size: 64.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for pymarc_csv-0.1.1.tar.gz
Algorithm Hash digest
SHA256 05c59d811c8428eca9eb7f5c963ccc12e658a0858aa7e1b8a770518b0ba03fc1
MD5 0d94947045fcefec338896565e4bda26
BLAKE2b-256 8e514d7fcb6d704b3006570cb5b0e45f68570a508bd2e91e18e24efdc3cf80c5

See more details on using hashes here.

File details

Details for the file pymarc_csv-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pymarc_csv-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bf4a0ee6f91e28052c2d8bd789bff5f3b25ba9127f90669f7b73bf12f5823e72
MD5 9ccfc917eb79ecdb099bdade6fc1d776
BLAKE2b-256 de4ba35435e9583c94c8f5eacbd7c8b72d41dc452c667b6b7deddfd5545bb98f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page