This package can convert an XML files to a list of records
Project description
XML Records
xmlrecords
is a user-friendly wrapper of lxml
package for extraction of tabular data from XML files.
This data provider sends all his data in... XML. You know nothing about XML, except that it looks kind of weird and you would definitely never use it for tabular data. How could you just transform all this XML nightmare into a sensible tabular format, like a DataFrame? Don't worry: you are in the right place!
Installation
pip install xmlrecords
The package requires python 3.7+
and one external dependency lxml
.
Usage
Basic example
Usually, you only need to specify path to table rows; optionally, you can specify paths to any extra data you'd like to add to your table:
# XML object
xml_bytes = b"""\
<?xml version="1.0" encoding="utf-8"?>
<Catalog>
<Library>
<Name>Virtual Shore</Name>
</Library>
<Shelf>
<Timestamp>2020-02-02T05:12:22</Timestamp>
<Book>
<Title>Sunny Night</Title>
<Author alive="no" name="Mysterious Mark"/>
<Year>2017</Year>
<Price>112.34</Price>
</Book>
<Book>
<Title>Babel-17</Title>
<Author alive="yes" name="Samuel R. Delany"/>
<Year>1963</Year>
<Price>10</Price>
</Book>
</Shelf>
</Catalog>
"""
# Transform XML to records (= a list of key-value pairs)
import xmlrecords
records = xmlrecords.parse(
xml=xml_bytes,
records_path=['Shelf', 'Book'], # The rows are XML nodes with the repeating tag <Book>
meta_paths=[['Library', 'Name'], ['Shelf', 'Timestamp']], # Add additional "meta" nodes
)
for r in records:
print(r)
# Output:
# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Sunny Night', 'alive': 'no', 'name': 'Mysterious Mark', 'Year': '2017', 'Price': '112.34'}
# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Babel-17', 'alive': 'yes', 'name': 'Samuel R. Delany', 'Year': '1963', 'Price': '10'}
# Validate record keys
xmlrecords.validate(
records,
expected_keys=['Name', 'Timestamp', 'Title', 'alive', 'name', 'Year', 'Price'],
)
With Pandas
You can easily transform records to a pandas DataFrame:
import pandas as pd
df = pd.DataFrame(records)
With SQL
You can use records directly with INSERT statements if your SQL database is PEP 249 compliant. Most SQL databases are.
SQLite is an exception. There, you'll have to transform records (= a list of dictionaries) into a list of lists:
import sqlite3
with sqlite3.connect('maindev.db') as conn:
c = conn.cursor()
c.execute("""\
CREATE TABLE BOOKS (
LIBRARY_NAME TEXT,
SHELF_TIMESTAMP TEXT,
TITLE TEXT,
AUTHOR_ALIVE TEXT,
AUTHOR_NAME TEXT,
YEAR INT,
PRICE FLOAT,
PRIMARY KEY (TITLE, AUTHOR_NAME)
)
"""
)
c.executemany(
"""INSERT INTO BOOKS VALUES (?,?,?,?,?,?,?)""",
[list(x.values()) for x in records],
)
conn.commit()
FAQ
-
Why not
xmltodict
?xmltodict
can convert arbitrary XML to a python dict. However, it is 2-3 times slower thanxmlrecords
and does not support some features specific for tablular data. -
Why not
xml
orlxml
?xmlrecords
useslxml
under the hood. Usingxml
orlxml
directly is a viable option too - in case this package doesn't cover your particular use case.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file xmlrecords-0.3.1.tar.gz
.
File metadata
- Download URL: xmlrecords-0.3.1.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc00f49c811e3b87d663fedea79aa95213afac48283443e4f93e7b8378910ffc |
|
MD5 | a16ed2b609c24d057a0ca1b91fd9293b |
|
BLAKE2b-256 | a83ba4c758b0f3e10e358989223bdb14b84b83b99f242c8de93c802dbccead3e |
File details
Details for the file xmlrecords-0.3.1-py3-none-any.whl
.
File metadata
- Download URL: xmlrecords-0.3.1-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5d48ff035bd076b11105372fc2aa1a00a12aa39ea7306ca73fd9a0478baaddc |
|
MD5 | 8a29925384248bad03d13b9e3e21d69f |
|
BLAKE2b-256 | 1a44d3a79c7b5b6992567706e9ebaefb274525a9ab0784bc25ee20b40984c235 |