Skip to main content

A simple converter of MARC/MARCXML/PICAXML to CSV/TSV/parquet

Project description

bibxml2

A simple converter of (possibly compressed) MARCXML/PICAXML to (possibly compressed) CSV/TSV/parquet.

The resulting CSV/TSV/parquet has been designed to be easy to use as a data table, but also to retain all ordering informaation in the original when such is needed. The format is as follows: record_number,field_number,subfield_number,field_code,subfield_code,value

Here, record_number identifies the MARC/PICA+ record, while field_number and subfield_number can be used for more exact filtering / reconstructing the original field structure/order if needed.

For MARC data fields, ind1 and ind2 values are reported as separate rows with the subfield_code being Y or Z, but only when non-empty (MARC requires subfield codes to be lowercase, so this should be relatively safe). The MARC leader is output with field code LDR.

Installation

Install from pypi with e.g. pipx install bibxml2.

Usage

Usage: marcxml2 [OPTIONS] [INPUT]...

  Convert from MARCXML (compressed) input files into (compressed) CSV/TSV/parquet

Options:
  -o, --output TEXT  Output CSV/TSV (compressed) / parquet file  [required]
  --help             Show this message and exit.
Usage: picaxml2csv [OPTIONS] [INPUT]...

  Convert from PICAXML (compressed) input files into (compressed) CSV/TSV/parquet

Options:
  -o, --output TEXT  Output CSV/TSV (compressed) / parquet file  [required]
  --help             Show this message and exit.

If the output file extension is .parquet, the output will be in parquet format, compressed with zstd, and with field typings maximally compatible with common R and Python ecosystems. Otherwise, compressed files will be read/written if the filename ends with an identifier recognised by fsspec. TSV format will be used if the output filename contains .tsv, otherwise CSV will be used.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bib2-1.7.2.tar.gz (134.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bib2-1.7.2-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file bib2-1.7.2.tar.gz.

File metadata

  • Download URL: bib2-1.7.2.tar.gz
  • Upload date:
  • Size: 134.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for bib2-1.7.2.tar.gz
Algorithm Hash digest
SHA256 8288b36b48f9fcc1646d6095e419222924e6e2c5de14912ad835a4bbff6da693
MD5 3b9346330d4b7a911a411f5dc4f57ca4
BLAKE2b-256 b02ee2f40c7751042fdcad82d72cdc9bdd58f18e577a1f8ab1c951f52c2fff82

See more details on using hashes here.

File details

Details for the file bib2-1.7.2-py3-none-any.whl.

File metadata

  • Download URL: bib2-1.7.2-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for bib2-1.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c84c797759e4f5a5408e34742bc4813b2ecdb3c57160338db3010edf93f0e6a4
MD5 40610d9d48c5e25afdf7e7b7d0d4b1d7
BLAKE2b-256 dbd327f7a172e544d3dc874d1b7c55c3cdc31aae0ec655dfd15c863ecb720453

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page