Skip to main content

Handles data transfer Statbank <-> Dapla for Statistics Norway

Project description

dapla-statbank-client

Used internally by SSB (Statistics Norway). Validates and transfers data from Dapla to Statbank. Gets data from public and internal statbank.

Transferring

from statbank import StatbankClient
stat_client = StatbankClient(loaduser = "LASTEBRUKER")
# Fill out password
stat_client.transfer(df_06399, tabellid="06339")

The simplest form of usage, is directly-transferring using the transfer-method under the client-class. If the statbanktable expects multiple "deltabeller", dataframes must be passed in a list, in the correct order.

Building datasets

You can validate the data using the validate-method, without starting a transfer, like this: Validation will happen by default on user-side, in Python, using the "UttrekksBeskrivelse" (filbeskrivelse). Validation happens on the number of tables, number of columns, code usage in categorical columns, code usage in "suppression-columns" (prikkekolonner), and on timeformats (both length and characters used).

stat_client.validate(df_06339, tableid="06339")

You can also look at the "filbeskrivelse" which is returned in its own local class: StatbankUttrekksBeskrivelse

description_06339 = stat_client.get_description(tableid="06339")
print(description_06339)
# Interesting attributes
description_06339.deltabelltitler
description_06339.variabler
description_06339.kodelister
description_06339.prikking

Getting apidata

df_06339 = stat_client.apidata_all("06339", include_id=True)

apidata_all, does not need a specified query, it will build its own query, trying to get all the data from the table. This might be too much, resulting in an error.

The include_id-parameter is a bit magical, it gets both codes and value-columns for categorical columns, and tries to merge these next to each other, it also makes a check if the content is the same, then it will not include the content twice.

If you want to specify a query, to limit the response, use the method apidata instead.
Here we are requesting an "internal table" which only people at SSB have access to, with a specified URL and query.

query = {'query': [{'code': 'Region', 'selection': {'filter': 'vs:Landet', 'values': ['0']}}, {'code': 'Alder', 'selection': {'filter': 'vs:AldGrupp19', 'values': ['000', '001', '002', '003', '004', '005', '006', '007', '008', '009', '010', '011', '012', '013', '014', '015', '016', '017', '018', '019', '020', '021', '022', '023', '024', '025', '026', '027', '028', '029', '030', '031', '032', '033', '034', '035', '036', '037', '038', '039', '040', '041', '042', '043', '044', '045', '046', '047', '048', '049', '050', '051', '052', '053', '054', '055', '056', '057', '058', '059', '060', '061', '062', '063', '064', '065', '066', '067', '068', '069', '070', '071', '072', '073', '074', '075', '076', '077', '078', '079', '080', '081', '082', '083', '084', '085', '086', '087', '088', '089', '090', '091', '092', '093', '094', '095', '096', '097', '098', '099', '100', '101', '102', '103', '104', '105', '106', '107', '108', '109', '110', '111', '112', '113', '114', '115', '116', '117', '118', '119+']}}, {'code': 'Statsbrgskap', 'selection': {'filter': 'vs:Statsborgerskap', 'values': ['000']}}, {'code': 'Tid', 'selection': {'filter': 'item', 'values': ['2022']}}], 'response': {'format': 'json-stat2'}}

df_folkemengde = stat_client.apidata("https://i.ssb.no/pxwebi/api/v0/no/prod_24v_intern/START/be/be01/folkemengde/Rd0002Aa",
                                     query,
                                     include_id = True
                                    )

apidata_rotate is a thin wrapper around pivot_table. Stolen from: https://github.com/sehyoun/SSB_API_helper/blob/master/src/ssb_api_helper.py

df_folkemengde_rotert = stat_client.rotate(df_folkemengde, 'tidskolonne', "verdikolonne")

To import the apidata-functions outside the client (no need for password) do the imports like this:

from statbank.apidata import apidata_all, apidata, apidata_rotate

Batches

For the non-apidata-methods, there are "twin" batch-methods. For .transfer there is .transfer_batch and so on. Alternatively you can just run the methods above multiple times...

To transfer many tables at the same time.

transfers = stat_client.transfer_batch({"06339": df_06399,
                                        "05300": df_05300})
print(transfers["05300"])

To validate many tables at the same time.

stat_client.validate_batch({"06339": df_06399,
                            "05300": df_05300})

To get many descriptions at once send a list of tableids.

descriptions = stat_client.validate_batch(["06339","05300"])
print(descriptions["06339"])

Saving and restoring Uttrekksbeskrivelser and Transfers as json

*Might still be in development

From stat_client.transfer() you will recieve a StatbankTransfer object, from stat_client.get_description a StatbankUttrekksBeskrivelse-object. These can be serialized and saved to disk, and later be restored.

filbesk_06339 = stat_client.get_description("06339")
filbesk_06339.to_json("path.json")
# Later the file can be restored with
filbesk_06339_new = stat_client.read_description_json("path.json")

Some deeper data-structures, like the dataframes in the transfer will not be serialized and stored with the transfer-object in its json.


Version history

  • 0.0.1 Client, transfer, description, apidata. Quite a lot of work done already. Pre-alpha.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dapla_statbank_client-0.0.2.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

dapla_statbank_client-0.0.2-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file dapla_statbank_client-0.0.2.tar.gz.

File metadata

  • Download URL: dapla_statbank_client-0.0.2.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for dapla_statbank_client-0.0.2.tar.gz
Algorithm Hash digest
SHA256 3ae675c3a318ea959d4d671315c4783c8cf3c324730c13ecd45a47b77153dbeb
MD5 19e45c409a438c0b8e32adb4159b89ae
BLAKE2b-256 773b994738264116ac4fc5f08ce557f6e24d0bc117f3e99148fce79dc5fe764e

See more details on using hashes here.

File details

Details for the file dapla_statbank_client-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for dapla_statbank_client-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 89afa4a6ac698c38f98c6b29583652247d637f7f2524faeccecf282967c2f2e9
MD5 a6a309446721ec9d6d0727e25eb96545
BLAKE2b-256 14b0481cae0e65354a81ec9d6020635c0e81493bf2027c59f61f785ab5d270e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page