Skip to main content

SSB Altinn Python

Project description

SSB Altinn Python

PyPI Status Python Version License

Documentation Tests Coverage Quality Gate Status

pre-commit Black Ruff Poetry

Features

This is work-in-progress Python-package for dealing with xml-data from Altinn3. Here are some examples of how it can be used:

Transform to ISEE-Dynarev format

If you want to transform an Altinn3 xml-file to a Pandas Dataframe, in the same form as the ISEE Dynarev database in our on-prem environment, you can use the isee_transform-function.

from altinn import isee_transform

file = "gs://ssb-prod-dapla-felles-data-delt/altinn3/RA-0595/2023/2/6/810409282_460784f978a2_ebc7af7e-4ebe-4883-b844-66ee6292a93a/form_460784f978a2.xml"

isee_transform(file)

If you want to recode/map names in the FELTNAVN-column, you can use a dictionary with the original names from the xml as keys, and the new names as values. And then pass the dictionary as an argument when running the function isee_transform(file, mapping).

from altinn import isee_transform

file = "gs://ssb-prod-dapla-felles-data-delt/altinn3/RA-0595/2023/2/6/810409282_460784f978a2_ebc7af7e-4ebe-4883-b844-66ee6292a93a/form_460784f978a2.xml"

mapping = {'kontAmbulForeDispJaNei':'ISEE_VAR1',
           'kontAmbulForeDispAnt':'ISEE_VAR2',
           'kontAmbulForeDriftAnt':'ISEE_VAR3',}

isee_transform(file, mapping)

The function handles flat structures and 'tables' in the XML. If the XML contains repeating values, it puts a suffix containig a number at the end of the FELTNAVN-column. If the XML-contains more complex structures as 'table in table' if will give a warning with a list of which values in FELTNAVN that needs to be further processed before it can be used in ISEE.

The XML needs to contain certain fields in the 'InternInfo'-block, The required filds are:

  • 'enhetsIdent'
  • 'enhetsType'
  • 'delregNr'

If one or more of these fields are missing in the XML, the processing will stop, giving a message with witch fields that are missing.

The resulting object is a Pandas Dataframe with the following columns:

  • SKJEMA_ID
  • DELREG_NR
  • IDENT_NR
  • ENHETS_TYPE
  • FELTNAVN
  • FELTVERDI
  • VERSION_NR

This dataframe can be written to csv and uploaded to the ISEE Dynarev database.

Transform all XML-data to a pd.DataFrame

If you want to transform an Altinn3 xml-file to a Pandas Dataframe, without the extra ISEE-information, and keep all information (not just ‘SkjemaData), you can use the xml_transform-function.

from altinn import xml_transform

file = "gs://ssb-prod-dapla-felles-data-delt/altinn3/RA-0595/2023/2/6/810409282_460784f978a2_ebc7af7e-4ebe-4883-b844-66ee6292a93a/form_460784f978a2.xml"

xml_transform(file)

The resulting object is a Pandas Dataframe with the following columns:

  • FELTNAVN
  • FELTVERDI
  • LEVEL

FELTNAVN: the name of the xml-tags concatenated together for each level in the XML. FELTVERDI: the value of the xml-tag. LEVEL: A list with information about the concatenation level. If one or more of the values is greater than 1, it means there are repeating values in the tag.

Create filename for use in ISEE

If you need to transfer ISEE-data to the On-Prem-platform, the .csv-filename need a spesific format. The function create_isee_filename can create this filename from the filepath and contents of a XML-file.

from altinn import create_isee_filename

file = "gs://ssb-prod-dapla-felles-data-delt/altinn3/RA-0595/2023/2/6/810409282_460784f978a2_ebc7af7e-4ebe-4883-b844-66ee6292a93a/form_460784f978a2.xml"

create_isee_filename(file)

In the example above the output will be RA-0595A3_460784f978a2.csv. This can be used to build a new filepath to where you need to store the result after the XML is transformed to ISEE-format.

Get information about a file

from altinn import FileInfo

file = "gs://ssb-prod-dapla-felles-data-delt/altinn3/RA-0595/2023/2/6/810409282_460784f978a2_ebc7af7e-4ebe-4883-b844-66ee6292a93a/form_460784f978a2.xml"

# Create an instance of FileInfo
form = FileInfo(file)

# Get file filename without '.xml'-postfix
form.filename()
# Returns: 'form_dc551844cd74'

# Print an unformatted version of the file. Does not require the file to be parseable by an xml-library. Useful for inspecting unvalid xml-files.
form.print()

# Print a nicely formatted version of the file
form.pretty_print()

# Check if xml-file is valid. Useful to inspect xml-files with formal errors in the xml-schema.
form.validate()
# Returns True og False

Parse xml-file

If you want to transform an Altinn3 xml-file to a Pandas Dataframe, you can use the ParseSingleXml-class.

from altinn import ParseSingleXml

file = "gs://ssb-prod-dapla-felles-data-delt/altinn3/RA-0595/2023/2/6/810409282_460784f978a2_ebc7af7e-4ebe-4883-b844-66ee6292a93a/form_460784f978a2.xml"

form_content=ParseSingleXml(file)

# Get a Pandas Dataframe representation of the contents of the file
df=form_content.to_dataframe()

df.head()

Requirements

  • dapla-toolbelt >=1.6.2
  • defusedxml >=0.7.1
  • xmltodict >=0.13.0
  • pandas >= 2.2.0

Installation

You can install SSB Altinn Python via poetry from PyPI:

poetry add ssb-altinn-python

To install this in the Jupyter-environment on Dapla, where it is ment to be used, it is required to install it in an virtual environment. It is recommended to do this in an ssb-project where the preferred tool is poetry.

Usage

Please see the Reference Guide for details.

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the MIT license, SSB Altinn Python is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from Statistics Norway's SSB PyPI Template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ssb_altinn_python-0.4.7.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ssb_altinn_python-0.4.7-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file ssb_altinn_python-0.4.7.tar.gz.

File metadata

  • Download URL: ssb_altinn_python-0.4.7.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for ssb_altinn_python-0.4.7.tar.gz
Algorithm Hash digest
SHA256 6fd557aaf70906c1223ca449993b80c57d3dd6914227218fd2083ef4c64f8ef9
MD5 65edbcea0a7dc3015f6b34dedd902ec8
BLAKE2b-256 da65b399f9379d5fe5b389cff667e7a323614cf86b212ebecc93e6fc199adf37

See more details on using hashes here.

File details

Details for the file ssb_altinn_python-0.4.7-py3-none-any.whl.

File metadata

File hashes

Hashes for ssb_altinn_python-0.4.7-py3-none-any.whl
Algorithm Hash digest
SHA256 1e1d06a60d12bace36a69015b688d1babd9864e2028d8ea2aee7e7b40cfc9e2c
MD5 493f23c1b085afbc4da4f877f7014746
BLAKE2b-256 6c59f11fc3f70fed003e569384f7dc85d534c7923c92209c1f7a051d87840358

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page