Import complex XML files to a relational database
Project description
Xml2db
xml2db
is a Python package which allows loading XML data into a relational database. It is designed to handle complex
schemas which cannot be easily denormalized to a flat table, without any custom code.
It builds a data model (i.e. a set of database tables linked with foreign keys relationships) based on a XSD schema and allows parsing and loading XML files into the database, and get them back to XML, if needed.
It is as simple as:
from xml2db import DataModel
# Create a data model of tables with relations based on the XSD file
data_model = DataModel(
xsd_file="path/to/file.xsd",
connection_string="postgresql+psycopg2://testuser:testuser@localhost:5432/testdb",
)
# Parse an XML file based on this XSD
document = data_model.parse_xml(
xml_file="path/to/file.xml"
)
# Insert the document content into the database
document.insert_into_target_tables()
The data model will adhere closely to the XSD schema, but xml2db
will perform simplifications aimed at limiting the
complexity of the resulting data model and the storage footprint.
The raw data loaded into the database can then be processed using DBT, SQL views or other tools aimed at extracting, correcting and formatting the data into more user-friendly tables.
xml2db
is developed and used at the French energy regulation authority (CRE) to process XML
data.
This package uses sqlalchemy
to interact with the database, so it should work with different database backends. It has
been tested against PostgreSQL and MS SQL Server. It currently does not work with SQLite. You may have to install
additional packages to connect to your database (e.g. pyodbc
which is the default connector for MS SQL Server, or
psycopg2
for PostgreSQL).
Please read the package documentation website for all the details!
Installation
The package can be installed, preferably in a virtual environment, using pip
:
pip install xml2db
Testing
Running the tests requires installing additional development dependencies, after cloning the repo, with:
pip install -e .[tests,docs]
Run all tests with the following command:
python -m pytest
Integration tests require write access to a PostgreSQL or MS SQL Server database; the connection string is provided as an
environment variable DB_STRING
. If you want to run only conversion tests that do not require a database you can run:
pytest -m "not dbtest"
Contributing
Contributions are more than welcome, as well as bug reports, starting with the project's issue page.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file xml2db-0.9.4.tar.gz
.
File metadata
- Download URL: xml2db-0.9.4.tar.gz
- Upload date:
- Size: 32.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76c3f385895f5dbb99db70d14d43ed901062c60b5c0ef5172e2cb25717ec2f36 |
|
MD5 | 482f50a5e6496aa52d60e7fe7e673355 |
|
BLAKE2b-256 | 6763aa1d0584a1d3ca62afcdd99930a83a192056cd33a6fefc92412b8c4b569c |
File details
Details for the file xml2db-0.9.4-py3-none-any.whl
.
File metadata
- Download URL: xml2db-0.9.4-py3-none-any.whl
- Upload date:
- Size: 36.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0031b810ffa210b8d395c74a2ecabedab68bd80f8ebeedc5fe3c12d51f77a235 |
|
MD5 | ee10209f03334e24c2f2f8f9aed59a0d |
|
BLAKE2b-256 | 522202a7a4152a4012ea99e7fb2b2c52f852b8ab7f8612c31ef962f798f53a67 |