Skip to main content

A simple converter of MARCXML/PICAXML to CSV/TSV/parquet

Project description

marcxml2csv

A simple converter of (possibly gzipped) MARCXML/PICAXML to (possibly gzipped) CSV/TSV.

The resulting CSV/TSV has been designed to be easy to use as a data table, but also to retain all ordering informaation in the original when such is needed. The format is as follows: record_number,field_number,subfield_number,field_code,subfield_code,value

Here, record_number identifies the MARC/PICA+ record, while field_number and subfield_number can be used for more exact filtering / reconstructing the original field flow if needed.

For the MARC leader and control fields, subfield_number will be empty.

For MARC data fields, ind1 and ind2 values are reported as separate rows with the subfield_code being ind1 or ind2, but only when non-empty. The also have an empty subfield_number.

Installation

Install from pypi with e.g. pipx install marcxml2csv.

Usage

Usage: marcxml2csv [OPTIONS] [INPUT]...

  Convert from MARCXML (gz) input files into (gzipped) CSV/TSV

Options:
  -o, --output TEXT  Output CSV/TSV (gz) file  [required]
  --help             Show this message and exit.
Usage: picaxml2csv [OPTIONS] [INPUT]...

  Convert from PICAXML (gz) input files into (gzipped) CSV/TSV

Options:
  -o, --output TEXT  Output CSV/TSV (gz) file  [required]
  --help             Show this message and exit.

Files will be read/written using gzip if the filename ends with .gz. TSV format will be used if the output filename contains .tsv, otherwise CSV will be used.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bibxml2-1.1.0.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bibxml2-1.1.0-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file bibxml2-1.1.0.tar.gz.

File metadata

  • Download URL: bibxml2-1.1.0.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.11 Darwin/24.5.0

File hashes

Hashes for bibxml2-1.1.0.tar.gz
Algorithm Hash digest
SHA256 74e6009787a170a098605876b658f5ee3abd39761e1ca9f3831ee84b0d9acc1d
MD5 da70e1b436444430e17cf5b9ac1e8af0
BLAKE2b-256 09d885790b6514cadfa5d05e8018a46070e57e8246b1bfe6cda16b40eced5408

See more details on using hashes here.

File details

Details for the file bibxml2-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: bibxml2-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.11 Darwin/24.5.0

File hashes

Hashes for bibxml2-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8af229abcdf32ecbbd4900d60f2664101ffd068e70d582ef4cba68ad4257cdd5
MD5 15048fa5aaf898dbc1e50d7c5386e3c8
BLAKE2b-256 5058a911bb4ed4b2b77b8cae302441f500bd1d65d5652a988c454281ddb85040

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page