CSV reader and writer for MARC records - an extension for pymarc
Project description
pymarc_csv
CSV reader and writer for MARC records - an extension for pymarc. This can be useful where there's any value in making MARC records editable as a spreadsheet and for manipulating records with tools like Pandas. I admit, however, that the CSV serlialization implemented here, though far more readable than MARC21 itself, is still a bit of an eyesore.
Note that for processing MARC records as CSV or Parquet files there's also marctable. The main advantage of pymarc_csv is its integration with pymarc.
Overview
pymarc-csv extends the pymarc library to provide CSV reading and writing capabilities for MARC21 bibliographic records. This allows you to work with MARC data in a more accessible CSV format while maintaining full compatibility with pymarc's Record objects.
Features
- CSVReader: Read MARC records from CSV files
- CSVWriter: Write MARC records to CSV format
- CSV serialization: Convert Record objects to/from CSV strings
- Duplicate field handling: Automatically handles repeated MARC fields (e.g., multiple 650 fields become
650,650_2,650_3) - Field order preservation: Maintains original field order through a
field_ordercolumn - Full pymarc compatibility: Works with existing pymarc Record objects
Installation
pip install pymarc-csv
Requirements
- Python >= 3.10
- pymarc >= 5.3.1
Quick Start
Reading CSV files
This is closely analogous to reading JSON and XML records
in pymarc.
from pymarc_csv import CSVReader
# Read MARC records from CSV
with open('records.csv', 'r') as fh:
reader = CSVReader(fh)
for record in reader:
print(record.title)
print(record['245']['a'])
Writing CSV files
This is where things get a bit more complicated as compared to other file formats in pymarc. In general, the main difference is that all Record objects to be written should be collected as a list first.
from pymarc_csv import CSVWriter
writer = CSVWriter(open('output.csv','wt'))
writer = CSVWriter(fh)
writer.write([record1, record2, record3]) # Write multiple at once
writer.close()
If you then wanted to add further records without introducing any new CSV headings (so no new fields or unseen duplicate fields), then before calling writer.close():
record = Record()
record.add_field(
Field(
tag='245',
indicators=Indicators('1', '0'),
subfields=[
Subfield(code='a', value='Python Programming'),
Subfield(code='c', value='Guido van Rossum')
]
)
)
# Write to CSV
writer.write(record)
writer.close()
To avoid having to store a large list of Records first, you could also
use the add_tags method and then write records one by one using write_one.
This is rather cumbersome, however, so you might be better off just using
marctable at that point.
Converting records to/from CSV strings
from pymarc_csv import as_csv, parse_csv_to_dict
# Record to CSV string
csv_string = as_csv(record)
# CSV string back to dict
record_dict = parse_csv_to_dict(csv_string)
CSV Format
The CSV format used by pymarc-csv has the following structure:
- LDR column: Contains the record leader
- Field columns: One column per MARC field (e.g.,
001,245,650) - Duplicate fields: Numbered with suffixes (e.g.,
650,650_2,650_3) - field_order column: Preserves the original order of fields
Example CSV output (showing one MARC record as a table for readability):
| Field | Value |
|---|---|
| 001 | fol05731351 |
| 003 | IMchF |
| 005 | 20000613133448.0 |
| 008 | 000107s2000 nyua 001 0 eng |
| 010 | \$a 00020737 |
| 020 | \$a0471383147 (paper/cd-rom : alk. paper) |
| 040 | \$aDLC$cDLC$dDLC |
| 042 | \$apcc |
| 050 | 00$aQA76.73.P22$bM33 2000 |
| 082 | 00$a005.13/3$221 |
| 100 | 1\$aMartinsson, Tobias,$d1976- |
| 245 | 10$aActivePerl with ASP and ADO /$cTobias Martinsson. |
| 260 | \$aNew York :$bJohn Wiley & Sons,$c2000. |
| 300 | \$axxi, 289 p. :$bill. ;$c23 cm. +$e1 computer laser disc (4 3/4 in.) |
| 500 | \$a"Wiley Computer Publishing." |
| 630 | 00$aActive server pages. |
| 630_2 | 00$aActiveX. |
| 650 | \0$aPerl (Computer program language) |
| LDR | 00755cam 22002414a 4500 |
| field_order | 001 003 005 008 010 020 040 042 050 082 100 245 260 300 500 630 630_2 650 |
An un-prettified version of this can be found in test/one.csv.
Development
Running Tests
python -m unittest
License
BSD 2-Clause License (same as pymarc)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Credits
Built as an extension to the pymarc library maintained by Ed Summers, Andrew Hankinson and contributors.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pymarc_csv-0.1.1.tar.gz.
File metadata
- Download URL: pymarc_csv-0.1.1.tar.gz
- Upload date:
- Size: 64.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05c59d811c8428eca9eb7f5c963ccc12e658a0858aa7e1b8a770518b0ba03fc1
|
|
| MD5 |
0d94947045fcefec338896565e4bda26
|
|
| BLAKE2b-256 |
8e514d7fcb6d704b3006570cb5b0e45f68570a508bd2e91e18e24efdc3cf80c5
|
File details
Details for the file pymarc_csv-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pymarc_csv-0.1.1-py3-none-any.whl
- Upload date:
- Size: 13.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf4a0ee6f91e28052c2d8bd789bff5f3b25ba9127f90669f7b73bf12f5823e72
|
|
| MD5 |
9ccfc917eb79ecdb099bdade6fc1d776
|
|
| BLAKE2b-256 |
de4ba35435e9583c94c8f5eacbd7c8b72d41dc452c667b6b7deddfd5545bb98f
|