Skip to main content

A Python script to extract XML fields to columns in CSV file(s). The script works in a streaming fashion and also has features to resolve 1:n relationships.

Project description

XML-to-CSV

A Python script to extract XML fields to columns in CSV file(s). The script works in a streaming fashion and also has features to resolve 1:n relationships.

Usage via the commandline

Create and activate a Python virtual environment

# Create a new Python virtual environment
python3 -m venv py-xml-to-csv-env

# Activate the virtual environment
source py-xml-to-csv-env/bin/activate

# Install dependencies
pip install -r requirements.txt

Afterwards the script can be executed via the commandline:

python -m xml_to_csv.xml_to_csv \
  -c config-example.json \
  -d date-mapping.json \
  -p "my-data" \
  -o "my-data.csv" \
  -l "my-data.log" \
  "my-input.xml"

For the config example the following files will be created

  • my-data.csv
  • my-data-name.csv
  • my-data-alternateNames.csv
  • my-data-pseudonyms.csv
  • my-data-birthDate.csv
  • my-data-deathDate.csv
  • my-data-isni.csv

The file my-date.csv is the general output file in which every column besides the identifier column is an array containing possible 1:n relationships. The other files contain 1:n relationships between each record and the values of a single column of the output.

LICENSE

This script makes use of the following other software libraries.

Library Description License
lxml A library used to iterate fast (in a streaming fashion) over the input XML files BSD
tqdm A library to provide a user-friendly progress bar, used to show the progress of file extraction. MIT

Contact

Sven Lieber - Sven.Lieber@kbr.be - Royal Library of Belgium (KBR) - https://www.kbr.be/en/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xml_to_csv-0.1.3.tar.gz (19.2 kB view details)

Uploaded Source

File details

Details for the file xml_to_csv-0.1.3.tar.gz.

File metadata

  • Download URL: xml_to_csv-0.1.3.tar.gz
  • Upload date:
  • Size: 19.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.64.1 CPython/3.8.10

File hashes

Hashes for xml_to_csv-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f32f8dfdc106658aa32731714942aea15c665a398ff6d76c78608b6de611965b
MD5 127ddf3c5a7056dee40412eaaa0f7e6a
BLAKE2b-256 22ad933fb741efb6624fe0c8bf0ea95fd481698ec62d920a5f36ecb2fb4e2125

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page