A simple converter of MARCXML/PICAXML to CSV/TSV/parquet
Project description
bibxml2
A simple converter of (possibly compressed) MARCXML/PICAXML to (possibly compressed) CSV/TSV/parquet.
The resulting CSV/TSV/parquet has been designed to be easy to use as a data table, but also to retain all ordering informaation in the original when such is needed. The format is as follows:
record_number,field_number,subfield_number,field_code,subfield_code,value
Here, record_number identifies the MARC/PICA+ record, while field_number and subfield_number can be used for more exact filtering / reconstructing the original field structure/order if needed.
For MARC data fields, ind1 and ind2 values are reported as separate rows with the subfield_code being i_1 or i_2, but only when non-empty.
Installation
Install from pypi with e.g. pipx install bibxml2.
Usage
Usage: marcxml2 [OPTIONS] [INPUT]...
Convert from MARCXML (compressed) input files into (compressed) CSV/TSV/parquet
Options:
-o, --output TEXT Output CSV/TSV (compressed) / parquet file [required]
--help Show this message and exit.
Usage: picaxml2csv [OPTIONS] [INPUT]...
Convert from PICAXML (compressed) input files into (compressed) CSV/TSV/parquet
Options:
-o, --output TEXT Output CSV/TSV (compressed) / parquet file [required]
--help Show this message and exit.
If the output file extension is .parquet, the output will be in parquet format, compressed with zstd, and with field typings maximally compatible with common R and Python ecosystems. Otherwise, compressed files will be read/written if the filename ends with an identifier recognised by fsspec. TSV format will be used if the output filename contains .tsv, otherwise CSV will be used.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bibxml2-1.1.1.tar.gz.
File metadata
- Download URL: bibxml2-1.1.1.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.11 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c2117bd884785310b58673367abdd7c81a97b40e7ae968880c7728d1aedd711
|
|
| MD5 |
debecdbc38be26846b9fb72e4f2beace
|
|
| BLAKE2b-256 |
65fc6a651d20b862ec5317ec1c93384be4e8463473f5cf8009a5976717cb31cd
|
File details
Details for the file bibxml2-1.1.1-py3-none-any.whl.
File metadata
- Download URL: bibxml2-1.1.1-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.11 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cb489040b11264e1928ad6b395649b3e124a5184f60ae23a7f60da6d084420e
|
|
| MD5 |
597f53eded8fbe8c8b157706e53376c4
|
|
| BLAKE2b-256 |
3d6e8addd3a78e2229977634cdb9fd600c72a223b6560c966ddf774b99c669c6
|