Skip to main content

A Python script to extract XML fields to columns in CSV file(s). The script works in a streaming fashion and also has features to resolve 1:n relationships.

Project description

XML-to-CSV

A Python script to extract XML fields to columns in CSV file(s). The script works in a streaming fashion and also has features to resolve 1:n relationships.

Usage via the commandline

Create and activate a Python virtual environment

# Create a new Python virtual environment
python3 -m venv py-xml-to-csv-env

# Activate the virtual environment
source py-xml-to-csv-env/bin/activate

# Install dependencies
pip install -r requirements.txt

Afterwards the script can be executed via the commandline:

python -m xml_to_csv.xml_to_csv \
  -c config-example.json \
  -d date-mapping.json \
  -p "my-data" \
  -o "my-data.csv" \
  -l "my-data.log" \
  "my-input.xml"

For the config example the following files will be created

  • my-data.csv
  • my-data-name.csv
  • my-data-alternateNames.csv
  • my-data-pseudonyms.csv
  • my-data-birthDate.csv
  • my-data-deathDate.csv
  • my-data-isni.csv

The file my-date.csv is the general output file in which every column besides the identifier column is an array containing possible 1:n relationships. The other files contain 1:n relationships between each record and the values of a single column of the output.

LICENSE

This script makes use of the following other software libraries.

Library Description License
lxml A library used to iterate fast (in a streaming fashion) over the input XML files BSD
tqdm A library to provide a user-friendly progress bar, used to show the progress of file extraction. MIT

Contact

Sven Lieber - Sven.Lieber@kbr.be - Royal Library of Belgium (KBR) - https://www.kbr.be/en/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xml_to_csv-0.1.5.tar.gz (22.8 kB view details)

Uploaded Source

File details

Details for the file xml_to_csv-0.1.5.tar.gz.

File metadata

  • Download URL: xml_to_csv-0.1.5.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for xml_to_csv-0.1.5.tar.gz
Algorithm Hash digest
SHA256 96d3a20d90267dc1d302cd30046bcb7f373fe2265cb5de11ed9d2b416d9ad157
MD5 1865a60be70c98a73bfb987a9d486604
BLAKE2b-256 d2c47eac0c8432562d5243c7b9090a9aceb663912a8b775fc2f4a8715ad5b2cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page