Skip to main content

Handles data transfer Statbank <-> Dapla for Statistics Norway

Project description

dapla-statbank-client

Used internally by SSB (Statistics Norway). Validates and transfers data from Dapla to Statbank. Gets data from public and internal statbank.

Installing from Pypi with Poetry

If your project has been set up with ssb-project create, navigate into the folder with the terminal. cd project-name. Then install the package:

poetry add dapla-statbank-client
ssb-project build

Make a notebook with the project's kernel, try this code to verify that you can "log in":

from statbank import StatbankClient
stat_client = StatbankClient(loaduser = "LASTEBRUKER")
# Change LASTEBRUKER to your load-statbank-username
# Fill out password
# Default publishing-date is TOMORROW
print(stat_client)

Be aware that from the dapla staging environment you will be sending to statbank-TEMP-database, your changes will not be published. But if you are in the main dapla-jupyterlab (prod), you WILL publish to statbanken, in the PROD database. So pay extra attention to the publishing-date when in dapla-main-prod-jupyterlab.

Building datasets

You can look at the "filbeskrivelse" which is returned from stat_client.get_description() in its own local class: StatbankUttrekksBeskrivelse

description_06339 = stat_client.get_description(tableid="06339")
print(description_06339)

This should have all the information you are used to reading out from the old "Filbeskrivelse". And describes how you should construct your data.

# Interesting attributes
description_06339.subtables
description_06339.variables
description_06339.codelists
description_06339.suppression

Your data must be placed in a datastructure, a dict of pandas dataframes. Take a look at how the dict should be with:

description_06339.transferdata_template()

This both returns the dict, and prints it, depending on what you want to do with it. Use it to insert your own DataFrames into, and send it to .transfer()

After starting to construct your data, you can validate it against the Uttrekksbeskrivelse, using the validate-method, without starting a transfer, like this:

stat_client.validate(df_06339, tableid="06339")

Validation will happen by default on user-side, in Python. Validation happens on the number of tables, number of columns, code usage in categorical columns, code usage in "suppression-columns" (prikkekolonner), and on timeformats (both length and characters used).

Usage Transferring

stat_client.transfer({"deltabellfilnavn.dat" : df_06399}, "06339")

The simplest form of usage, is directly-transferring using the transfer-method under the client-class. The statbanktable expects named "deltabeller" in a dictionary, see trasferdata_template() above.

Getting apidata

df_06339 = stat_client.apidata_all("06339", include_id=True)

apidata_all, does not need a specified query, it will build its own query, trying to get all the data from the table. This might be too much, resulting in an error.

The include_id-parameter is a bit magical, it gets both codes and value-columns for categorical columns, and tries to merge these next to each other, it also makes a check if the content is the same, then it will not include the content twice.

If you want to specify a query, to limit the response, use the method apidata instead.
Here we are requesting an "internal table" which only people at SSB have access to, with a specified URL and query.

query = {'query': [{'code': 'Region', 'selection': {'filter': 'vs:Landet', 'values': ['0']}}, {'code': 'Alder', 'selection': {'filter': 'vs:AldGrupp19', 'values': ['000', '001', '002', '003', '004', '005', '006', '007', '008', '009', '010', '011', '012', '013', '014', '015', '016', '017', '018', '019', '020', '021', '022', '023', '024', '025', '026', '027', '028', '029', '030', '031', '032', '033', '034', '035', '036', '037', '038', '039', '040', '041', '042', '043', '044', '045', '046', '047', '048', '049', '050', '051', '052', '053', '054', '055', '056', '057', '058', '059', '060', '061', '062', '063', '064', '065', '066', '067', '068', '069', '070', '071', '072', '073', '074', '075', '076', '077', '078', '079', '080', '081', '082', '083', '084', '085', '086', '087', '088', '089', '090', '091', '092', '093', '094', '095', '096', '097', '098', '099', '100', '101', '102', '103', '104', '105', '106', '107', '108', '109', '110', '111', '112', '113', '114', '115', '116', '117', '118', '119+']}}, {'code': 'Statsbrgskap', 'selection': {'filter': 'vs:Statsborgerskap', 'values': ['000']}}, {'code': 'Tid', 'selection': {'filter': 'item', 'values': ['2022']}}], 'response': {'format': 'json-stat2'}}

df_folkemengde = stat_client.apidata("https://i.ssb.no/pxwebi/api/v0/no/prod_24v_intern/START/be/be01/folkemengde/Rd0002Aa",
                                     query,
                                     include_id = True
                                    )

apidata_rotate is a thin wrapper around pivot_table. Stolen from: https://github.com/sehyoun/SSB_API_helper/blob/master/src/ssb_api_helper.py

df_folkemengde_rotert = stat_client.rotate(df_folkemengde, 'tidskolonne', "verdikolonne")

To import the apidata-functions outside the client (no need for password) do the imports like this:

from statbank.apidata import apidata_all, apidata, apidata_rotate

Saving and restoring Uttrekksbeskrivelser and Transfers as json

From stat_client.transfer() you will recieve a StatbankTransfer object, from stat_client.get_description a StatbankUttrekksBeskrivelse-object. These can be serialized and saved to disk, and later be restored.

filbesk_06339 = stat_client.get_description("06339")
filbesk_06339.to_json("path.json")
# Later the file can be restored with
filbesk_06339_new = stat_client.read_description_json("path.json")

Some deeper data-structures, like the dataframes in the transfer will not be serialized and stored with the transfer-object in its json.


Version history

  • 0.0.10 Statbank people wanted a user-agent-requesst-header to differentiate test from prod
  • 0.0.9 After further user-testing and requests
  • 0.0.5 Still some parameter issues
  • 0.0.4 More test coverage, some bugs fixed in rounding checks and parameter-passing
  • 0.0.3 Removed batches, stripping uttrekk from transfer, rounding function on uttrekk, data required in as a dict of dataframes, with "deltabell-navn". Tableid now works to transfer to instead of only "hovedtabellnavn"
  • 0.0.2 Starting alpha, fine-tuning release to Pypi on github-release
  • 0.0.1 Client, transfer, description, apidata. Quite a lot of work done already. Pre-alpha.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dapla_statbank_client-0.0.11.tar.gz (22.4 kB view hashes)

Uploaded Source

Built Distribution

dapla_statbank_client-0.0.11-py3-none-any.whl (22.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page