Skip to main content

This package can convert an XML files to a list of records

Project description

XML Records

xmlrecords is a user-friendly wrapper of lxml package for extraction of tabular data from XML files.

This data provider sends all his data in... XML. You know nothing about XML, except that it looks kind of weird and you would definitely never use it for tabular data. How could you just transform all this XML nightmare into a sensible tabular format, like a DataFrame? Don't worry: you are in the right place!

Installation

pip install xmlrecords

The package requires python 3.7+ and one external dependency lxml.

Usage

Basic example

Usually, you only need to specify path to table rows; optionally, you can specify paths to any extra data you'd like to add to your table:

# XML object
xml_bytes = b"""\
<?xml version="1.0" encoding="utf-8"?>
<Catalog>
    <Library>
        <Name>Virtual Shore</Name>
    </Library>
    <Shelf>
        <Timestamp>2020-02-02T05:12:22</Timestamp>
        <Book>
            <Title>Sunny Night</Title>
            <Author alive="no" name="Mysterious Mark"/>
            <Year>2017</Year>
            <Price>112.34</Price>
        </Book>
        <Book>
            <Title>Babel-17</Title>
            <Author alive="yes" name="Samuel R. Delany"/>
            <Year>1963</Year>
            <Price>10</Price>
        </Book>
    </Shelf>
</Catalog>
"""

# Transform XML to records (= a list of key-value pairs)
import xmlrecords
records = xmlrecords.parse(
    xml=xml_bytes, 
    records_path=['Shelf', 'Book'],  # The rows are XML nodes with the repeating tag <Book>
    meta_paths=[['Library', 'Name'], ['Shelf', 'Timestamp']],  # Add additional "meta" nodes
)
for r in records:
    print(r)

# Output:
# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Sunny Night', 'alive': 'no', 'name': 'Mysterious Mark', 'Year': '2017', 'Price': '112.34'}
# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Babel-17', 'alive': 'yes', 'name': 'Samuel R. Delany', 'Year': '1963', 'Price': '10'}

# Validate record keys
xmlrecords.validate(
    records, 
    expected_keys=['Name', 'Timestamp', 'Title', 'alive', 'name', 'Year', 'Price'],
)

With Pandas

You can easily transform records to a pandas DataFrame:

import pandas as pd
df = pd.DataFrame(records)

With SQL

You can use records directly with INSERT statements if your SQL database is PEP 249 compliant. Most SQL databases are.

SQLite is an exception. There, you'll have to transform records (= a list of dictionaries) into a list of lists:

import sqlite3
with sqlite3.connect('maindev.db') as conn:
    c = conn.cursor()
    c.execute("""\
        CREATE TABLE BOOKS (
           LIBRARY_NAME TEXT,
           SHELF_TIMESTAMP TEXT,
           TITLE TEXT,
           AUTHOR_ALIVE TEXT,
           AUTHOR_NAME TEXT,
           YEAR INT,
           PRICE FLOAT,
           PRIMARY KEY (TITLE, AUTHOR_NAME)
        )
        """
    )
    c.executemany(
        """INSERT INTO BOOKS VALUES (?,?,?,?,?,?,?)""",
        [list(x.values()) for x in records],
    )
    conn.commit()

FAQ

  1. Why not xmltodict? xmltodict can convert arbitrary XML to a python dict. However, it is 2-3 times slower than xmlrecords and does not support some features specific for tablular data.

  2. Why not xml or lxml? xmlrecords uses lxml under the hood. Using xml or lxml directly is a viable option too - in case this package doesn't cover your particular use case.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xmlrecords-0.3.1.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

xmlrecords-0.3.1-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file xmlrecords-0.3.1.tar.gz.

File metadata

  • Download URL: xmlrecords-0.3.1.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for xmlrecords-0.3.1.tar.gz
Algorithm Hash digest
SHA256 dc00f49c811e3b87d663fedea79aa95213afac48283443e4f93e7b8378910ffc
MD5 a16ed2b609c24d057a0ca1b91fd9293b
BLAKE2b-256 a83ba4c758b0f3e10e358989223bdb14b84b83b99f242c8de93c802dbccead3e

See more details on using hashes here.

File details

Details for the file xmlrecords-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: xmlrecords-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for xmlrecords-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c5d48ff035bd076b11105372fc2aa1a00a12aa39ea7306ca73fd9a0478baaddc
MD5 8a29925384248bad03d13b9e3e21d69f
BLAKE2b-256 1a44d3a79c7b5b6992567706e9ebaefb274525a9ab0784bc25ee20b40984c235

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page