A Python script to extract XML fields to columns in CSV file(s). The script works in a streaming fashion and also has features to resolve 1:n relationships.
Project description
XML-to-CSV
A Python script to extract XML fields to columns in CSV file(s). The script works in a streaming fashion and also has features to resolve 1:n relationships.
Usage via the commandline
Create and activate a Python virtual environment
# Create a new Python virtual environment
python3 -m venv py-xml-to-csv-env
# Activate the virtual environment
source py-xml-to-csv-env/bin/activate
# Install dependencies
pip install -r requirements.txt
Afterwards the script can be executed via the commandline:
python -m xml_to_csv.xml_to_csv \
-c config-example.json \
-d date-mapping.json \
-p "my-data" \
-o "my-data.csv" \
-l "my-data.log" \
"my-input.xml"
For the config example the following files will be created
my-data.csvmy-data-name.csvmy-data-alternateNames.csvmy-data-pseudonyms.csvmy-data-birthDate.csvmy-data-deathDate.csvmy-data-isni.csv
The file my-date.csv is the general output file in which every column besides the identifier column is an array containing possible 1:n relationships.
The other files contain 1:n relationships between each record and the values of a single column of the output.
Tests
To execute the tests use the command python -m unittest discover test/
LICENSE
This script makes use of the following other software libraries.
| Library | Description | License |
|---|---|---|
| lxml | A library used to iterate fast (in a streaming fashion) over the input XML files | BSD |
| tqdm | A library to provide a user-friendly progress bar, used to show the progress of file extraction. | MIT |
Contact
Sven Lieber - Sven.Lieber@kbr.be - Royal Library of Belgium (KBR) - https://www.kbr.be/en/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file xml_to_csv-0.1.6.tar.gz.
File metadata
- Download URL: xml_to_csv-0.1.6.tar.gz
- Upload date:
- Size: 23.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d6809802885af31d400c9bdba5dde8e8712af9bb231661cd1e5757d627ee7b5
|
|
| MD5 |
e91f49f5164ae2880bdf8c2595451021
|
|
| BLAKE2b-256 |
ade3c0bc17abace2df748cbc007beae82d4627aa3f452a34e4c8d9bfe64c8c41
|