Skip to main content

A Python script to extract XML fields to columns in CSV file(s). The script works in a streaming fashion and also has features to resolve 1:n relationships.

Project description

XML-to-CSV

A Python script to extract XML fields to columns in CSV file(s). The script works in a streaming fashion and also has features to resolve 1:n relationships.

Usage via the commandline

Create and activate a Python virtual environment

# Create a new Python virtual environment
python3 -m venv py-xml-to-csv-env

# Activate the virtual environment
source py-xml-to-csv-env/bin/activate

# Install dependencies
pip install -r requirements.txt

Afterwards the script can be executed via the commandline:

python -m xml_to_csv.xml_to_csv \
  -c config-example.json \
  -d date-mapping.json \
  -p "my-data" \
  -o "my-data.csv" \
  -l "my-data.log" \
  "my-input.xml"

For the config example the following files will be created

  • my-data.csv
  • my-data-name.csv
  • my-data-alternateNames.csv
  • my-data-pseudonyms.csv
  • my-data-birthDate.csv
  • my-data-deathDate.csv
  • my-data-isni.csv

The file my-date.csv is the general output file in which every column besides the identifier column is an array containing possible 1:n relationships. The other files contain 1:n relationships between each record and the values of a single column of the output.

LICENSE

This script makes use of the following other software libraries.

Library Description License
lxml A library used to iterate fast (in a streaming fashion) over the input XML files BSD
tqdm A library to provide a user-friendly progress bar, used to show the progress of file extraction. MIT

Contact

Sven Lieber - Sven.Lieber@kbr.be - Royal Library of Belgium (KBR) - https://www.kbr.be/en/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xml_to_csv-0.1.4.tar.gz (20.4 kB view details)

Uploaded Source

File details

Details for the file xml_to_csv-0.1.4.tar.gz.

File metadata

  • Download URL: xml_to_csv-0.1.4.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for xml_to_csv-0.1.4.tar.gz
Algorithm Hash digest
SHA256 c76a45cce913969870e5de46b6588f20db4cdc83345fc09f57739386baa37355
MD5 507005d5d432656f1e22773dd2d9de17
BLAKE2b-256 597da6c6130897f2ca27782bd4adf1c18dae56a8ecacf71b1ddf7c72a4e01d7a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page