Skip to main content

Download data from the Australian Bureau of Statistics (ABS) using its SDMX API

Project description

sdmxabs

sdmxabs is a small python package to download data from the Australian Bureau of Statistics using its SDMX API. SDMX stands for Statistical Data and Metadata eXchange. This package is designed to be used interactively within a Jupyter notebook.

Usage

import sdmxabs as sa
from sdmxabs import MatchType as Mt

Before you fetch data from the ABS, you need to know three things:

  • the flow identifier (flow_id) for the data you want. These are short strings, like "CPI" for the Consumer Price Index. You find these using the data_flows() function
  • the dimensions for this flow_id, which are used to select a specific data series. If no dimensiosn are set, the fetch() function will return all data series for a flow identifier. The dimensions can be found using the data_dimensions() function.
  • the codes the ABS uses to specify selected data series against these dimensions. The codes can be found in the relevant code_lists using the code_lists() function. The code list names are part of the information provided with the data dimenions.

Note: it is much, much faster to fetch one or two series using the data dimensions and code lists, than to fetch every data series associated with a flow identifier, and then search through the meta data for the data you want.

Key functions

Metadata

data_flows(flow_id:str='all', **kwargs: Unpack[GetFileKwargs]) -> dict[str, dict[str, str]] - returns the ABS data. The data is returned in a dictionary with the flow identifier as the key and the attributes of that flow in a dictionary of name-value pairs. You can turn the returned value from code_lists() into a pandas DataFrame, with the following: frame (code_lists(cl_id))

data_dimensions(flow_id: str, **kwargs: Unpack[GetFileKwargs]) -> dict[str, dict[str, str]] - returns the data dimensions and attributes associated with a specific ABS dataflow. The data is returned in a dictionary of dimension/attribute names, and their associated information in a dictionary. You can turn the returned value from code_lists() into a pandas DataFrame, with the following: frame (code_lists(cl_id))

code_lists(cl_id: str, **kwargs: Unpack[GetFileKwargs])-> dict[str, dict[str, str]] The data is returned in a dictionary of codes and their associated information. The code list identifiers (cl_id) can be found in the data dimensions (see previous). You can turn the returned value from code_lists() into a pandas DataFrame, with the following: frame (code_lists(cl_id))

code_list_for_dim(flow_id: str, dim_name: str, **kwargs: Unpack[GetFileKwargs]) -> dict[str, dict[str, str]] provides a quick method for getting the code list associated with a particular dimension in a dataflow.

frame(f: dict[str, dict[str, str]]) -> pd.DataFrame- a utility function to convert the output from the key flow metadata functions above to a more human readable pandas DataFrame.

The ABS data

Once you know what data you want, you can specify that information in a fetch() request.

fetch(flow_id: str, dims: dict[str, str] | None, parameters: dict[str, str] | None = None, validate: bool = False, **kwargs: Unpack[GetFileKwargs]) -> tuple[pd.DataFrame, pd.DataFrame]: - this function returns two DataFrames, the first is for data. The second is for the associated meta data. The column names in the data DataFrame will match the row names in the meta DataFrame. The dims argument is a dictionary, where the key is a dimension, and the value one or more codes from the relevant code list. Multiple values are concatenated with the "+" symbol. For example, the key value pair for extracting Seasonally Adjusted and Trend data is typically, {"TSEST": "20+30"}, where "TSEST" is the data dimenion. The validate argument reports if there were any issues translating your dimensions dictionary into the SDMX key.

fetch_multi(wanted: pd.DataFrame, parameters: dict[str, str] | None = None, validate: bool = False, **kwargs: Unpack[GetFileKwargs],) -> tuple[pd.DataFrame, pd.DataFrame] - allows for multiple items to be fetched and returned. Each selection is a row in a DataFrame. The column names are the data dimensions, and the flow_id. The function returns two DataFrames, the first for data and the second for metadata.

fetch_selection(flow_id: str, criteria: MatchCriteria, parameters: dict[str, str] | None = None, validate: bool = False, **kwargs: Unpack[GetFileKwargs]) -> tuple[pd.DataFrame, pd.DataFrame] is a function to fetch ABS data based on match text strings to the code names used by the ABS. It allows for a more human readable and intuitive selection of ABS data. The function returns two DataFrames, the first for data and the second for metadata.

measure_names(meta: pd.DataFrame) -> pd.Series: a convenience function to convert a metadata DataFrame into a series of y-axis labels.

recalibrate(data: pd.DataFrame, units: pd.Series, as_a_whole: bool = False) -> tuple[pd.DataFrame, pd.Series] - a convenience function to recalibrate a DataFrame returned from a fetch function so that the absolute maximum value is between 1 and 1000. The labels (from measure_names()) are also adjusted.

recalibrate_series(series: pd.Series, label: str) -> tuple[pd.Series, str] - similar to recalibrate, for a single series.

Other

FlowMetaDict is a useful type alias for dict[str, dict[str, str]], the type returned by all of the meta data functions.

make_wanted(flow_id: str, criteria: MatchCriteria) -> pd.DataFrame - convert a selection criteria into a one line DataFrame that can be used as the wanted argument in fetch_multi().

match_item(pattern: str, dimension: str, match_type: MatchType = MatchType.PARTIAL) -> MatchItem create a MatchItem from the arguments.

GetFileKwargs is a TypedDict. It specifies the possible arguments for data retrieval from the ABS:

  • verbose: bool - provide step-by-step information about the data retrieval process.
  • modaility: str - Which will be one of "prefer-cache" or "prefer-url". By defaulkt, the calls to the metadata functions [data_flows(), data_dimensions(), and code_lists()] are set to "prefer-cache". The fetch functions default to "prefer-url", which means they get the latest data from the ABS.

MatchType is an Enum for specifying the type of text-matching to be used in fetch_selection().

  • MatchType.EXACT - for exact matches.
  • MatchType.PARTIAL - for partial (case-insensitive) matches, and
  • MatchType.REGEX - for regular expression matches.

MatchItem: tuple[str, str, MatchType] is a tuple use to select codes from a code list. It has three elements: the pattern to match against a code name (from a code list), The dimension being matched, and the MatchType.

MatchCriteria: Sequence[MatchItem] is a sequence of MatchItem used by select_items() to build a one line DataFrame, that can be used as the wanted argument to fetch_multi().

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdmxabs-0.1.11.tar.gz (226.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdmxabs-0.1.11-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file sdmxabs-0.1.11.tar.gz.

File metadata

  • Download URL: sdmxabs-0.1.11.tar.gz
  • Upload date:
  • Size: 226.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for sdmxabs-0.1.11.tar.gz
Algorithm Hash digest
SHA256 6e77671ce364b8a08a0590f9f23ed7cd48f67ab9e1dcea9e34d3f8b75c890493
MD5 33aca926e08b637cdd0cdc44873fad37
BLAKE2b-256 eae6d310c8258bed756c00718968156aa62b30e5f2954359ee9e0ec6e040d259

See more details on using hashes here.

File details

Details for the file sdmxabs-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: sdmxabs-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 30.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for sdmxabs-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 9d4409d0644c7c8e95ac30e99d358077d237eadb3defff59c2b8d3a006b23205
MD5 494e5970b45257bbdac1c374ba3a8a66
BLAKE2b-256 56272862d4f0fe8623d0199c0e72ef8b7f58b1e0a07460ad71c5aedd727a8f62

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page