Skip to main content

A simple converter of MARCXML/PICAXML to CSV/TSV/parquet

Project description

bibxml2

A simple converter of (possibly compressed) MARCXML/PICAXML to (possibly compressed) CSV/TSV/parquet.

The resulting CSV/TSV/parquet has been designed to be easy to use as a data table, but also to retain all ordering informaation in the original when such is needed. The format is as follows: record_number,field_number,subfield_number,field_code,subfield_code,value

Here, record_number identifies the MARC/PICA+ record, while field_number and subfield_number can be used for more exact filtering / reconstructing the original field structure/order if needed.

For MARC data fields, ind1 and ind2 values are reported as separate rows with the subfield_code being i_1 or i_2, but only when non-empty.

Installation

Install from pypi with e.g. pipx install bibxml2.

Usage

Usage: marcxml2 [OPTIONS] [INPUT]...

  Convert from MARCXML (compressed) input files into (compressed) CSV/TSV/parquet

Options:
  -o, --output TEXT  Output CSV/TSV (compressed) / parquet file  [required]
  --help             Show this message and exit.
Usage: picaxml2csv [OPTIONS] [INPUT]...

  Convert from PICAXML (compressed) input files into (compressed) CSV/TSV/parquet

Options:
  -o, --output TEXT  Output CSV/TSV (compressed) / parquet file  [required]
  --help             Show this message and exit.

If the output file extension is .parquet, the output will be in parquet format, compressed with zstd, and with field typings maximally compatible with common R and Python ecosystems. Otherwise, compressed files will be read/written if the filename ends with an identifier recognised by fsspec. TSV format will be used if the output filename contains .tsv, otherwise CSV will be used.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bibxml2-1.1.6.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bibxml2-1.1.6-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file bibxml2-1.1.6.tar.gz.

File metadata

  • Download URL: bibxml2-1.1.6.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.11 Darwin/24.5.0

File hashes

Hashes for bibxml2-1.1.6.tar.gz
Algorithm Hash digest
SHA256 13cdc7fe7d01274c4448ec6fa9d9f00b42d4e372d0e7c566e784582e0ec8f6b0
MD5 ff2009862d603198593a57602793dafd
BLAKE2b-256 6c959b6358b188ade9165ad6ce41513de46b3c4ee116ec56277a1ee22ab74cd0

See more details on using hashes here.

File details

Details for the file bibxml2-1.1.6-py3-none-any.whl.

File metadata

  • Download URL: bibxml2-1.1.6-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.11 Darwin/24.5.0

File hashes

Hashes for bibxml2-1.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 5f3f886c69796438b5201ae4407c277e50a77d5bb7b8445c66ebb7f4b7ae8a01
MD5 f9cd88dbab74b4a40c6021a1430885ea
BLAKE2b-256 fa3c3e5fbab51a38a225de659e9b9e4043569468162a4d757ae9b8e374c31e90

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page