Skip to main content

A Python script to extract XML fields to columns in CSV file(s). The script works in a streaming fashion and also has features to resolve 1:n relationships.

Project description

XML-to-CSV

A Python script to extract XML fields to columns in CSV file(s). The script works in a streaming fashion and also has features to resolve 1:n relationships.

Usage via the commandline

Create and activate a Python virtual environment

# Create a new Python virtual environment
python3 -m venv py-xml-to-csv-env

# Activate the virtual environment
source py-xml-to-csv-env/bin/activate

# Install dependencies
pip install -r requirements.txt

Afterwards the script can be executed via the commandline:

python -m xml_to_csv.xml_to_csv \
  -c config-example.json \
  -d date-mapping.json \
  -p "my-data" \
  -o "my-data.csv" \
  -l "my-data.log" \
  "my-input.xml"

For the config example the following files will be created

  • my-data.csv
  • my-data-name.csv
  • my-data-alternateNames.csv
  • my-data-pseudonyms.csv
  • my-data-birthDate.csv
  • my-data-deathDate.csv
  • my-data-isni.csv

The file my-date.csv is the general output file in which every column besides the identifier column is an array containing possible 1:n relationships. The other files contain 1:n relationships between each record and the values of a single column of the output.

LICENSE

This script makes use of the following other software libraries.

Library Description License
lxml A library used to iterate fast (in a streaming fashion) over the input XML files BSD
tqdm A library to provide a user-friendly progress bar, used to show the progress of file extraction. MIT

Contact

Sven Lieber - Sven.Lieber@kbr.be - Royal Library of Belgium (KBR) - https://www.kbr.be/en/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xml_to_csv-0.1.1.tar.gz (18.1 kB view details)

Uploaded Source

File details

Details for the file xml_to_csv-0.1.1.tar.gz.

File metadata

  • Download URL: xml_to_csv-0.1.1.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.64.1 CPython/3.8.10

File hashes

Hashes for xml_to_csv-0.1.1.tar.gz
Algorithm Hash digest
SHA256 228796ae69c7fab8e489f8022f36c1d2e349a36d2a7340dd33b91ada0ac2ea6a
MD5 83b38f74e44f7da839fe40332aeda425
BLAKE2b-256 0ad66af21c615606fc4c53024ad788c524d01453e29b8e01c4d48728a3e24c4e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page