Skip to main content

Import complex XML files to a relational database

Project description

Xml2db

xml2db is a Python package which allows loading XML data into a relational database. It is designed to handle complex schemas which cannot be easily denormalized to a flat table, without any custom code.

It builds a data model (i.e. a set of database tables linked with foreign keys relationships) based on a XSD schema and allows parsing and loading XML files into the database, and get them back to XML, if needed.

It is as simple as:

from xml2db import DataModel

# Create a data model of tables with relations based on the XSD file
data_model = DataModel(
    xsd_file="path/to/file.xsd", 
    connection_string="postgresql+psycopg2://testuser:testuser@localhost:5432/testdb",
)
# Parse an XML file based on this XSD
document = data_model.parse_xml(
    xml_file="path/to/file.xml"
)
# Insert the document content into the database
document.insert_into_target_tables()

The data model will adhere closely to the XSD schema, but xml2db will perform simplifications aimed at limiting the complexity of the resulting data model and the storage footprint.

The raw data loaded into the database can then be processed using DBT, SQL views or other tools aimed at extracting, correcting and formatting the data into more user-friendly tables.

xml2db is developed and used at the French energy regulation authority (CRE) to process XML data.

This package uses sqlalchemy to interact with the database, so it should work with different database backends. Automated integration tests run against PostgreSQL, MySQL and MS SQL Server. xml2db does not work with SQLite. You may have to install additional packages to connect to your database (e.g. psycopg2 for PostgreSQL, pymysql for MySQL or pyodbc for MS SQL Server).

Please read the package documentation website for all the details!

Installation

The package can be installed, preferably in a virtual environment, using pip:

pip install xml2db

Testing

Running the tests requires installing additional development dependencies, after cloning the repo, with:

pip install -e .[tests,docs]

Run all tests with the following command:

python -m pytest

Integration tests require write access to a PostgreSQL or MS SQL Server database; the connection string is provided as an environment variable DB_STRING. If you want to run only conversion tests that do not require a database you can run:

pytest -m "not dbtest"

Contributing

Contributions are more than welcome, as well as bug reports, starting with the project's issue page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xml2db-0.10.0.tar.gz (32.9 kB view details)

Uploaded Source

Built Distribution

xml2db-0.10.0-py3-none-any.whl (36.2 kB view details)

Uploaded Python 3

File details

Details for the file xml2db-0.10.0.tar.gz.

File metadata

  • Download URL: xml2db-0.10.0.tar.gz
  • Upload date:
  • Size: 32.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for xml2db-0.10.0.tar.gz
Algorithm Hash digest
SHA256 d1bec12c5f3f338df169a493b89eb505a50ae141ae7432fac8fa125f4abc1cd1
MD5 d59b05462edcf58942b58d382829524a
BLAKE2b-256 a55f58bbb3a8e967ee228e216a840d62ff773c6acb3d5fb8b97e5f77d35674d2

See more details on using hashes here.

File details

Details for the file xml2db-0.10.0-py3-none-any.whl.

File metadata

  • Download URL: xml2db-0.10.0-py3-none-any.whl
  • Upload date:
  • Size: 36.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for xml2db-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f3dc196c2ebeb05b6b40e8b3aca95c17b3f9d0d3f7247d05ac84f864e59bae0b
MD5 c29ed51eb86d5d6076acd7f96f3d4e56
BLAKE2b-256 33a26f7678553676a275cf598724db2e45563748a6f72e12a24672e2d824c495

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page