Skip to main content

Get ABS timeseries data in pandas DataFrames

Project description

readabs

readabs is an open-source python package to download and work with timeseries data from the Australian Bureau of Statistics (ABS), using pandas DataFrames.


Usage:

Standand import arrangements

import readabs as ra
from readabs import metacol  # short column names for meta data DataFrames

Print a list of available catalogue identifiers from the ABS. You may need this to get the catalogue identifier/number for the data you want to download.

ra.print_abs_catalogue()

Get the ABS catalogue map as a pandas DataFrame.

cat_map = ra.catalogue_map()

Get all of the data tables associated with a particular catalogue identifier. The catalogue identifier is a string with the standard ABS identifier. For example, the cataloge identifier for the monthly labour force survey is "6202.0". Returns a tuple. The first element of the tuple is a dictionary of DataFrames. The dictionary is indexed by table names (which can be found in the meta data). The second element is a DataFrame for the meta data. Note: with some ABS catalogues, a specific series may be repeated in more than one table.

abs_dict, meta = ra.read_abs_cat(cat="id")

Get two DataFrames in a tuple, the first containing the actual data, and the second containing the meta data for one or more specified ABS series identifiers.

data, meta = ra.read_abs_series(cat="id", series="id1")
data, meta = ra.read_abs_series(cat="id", series=("id1", "id2, ...))

Additional utility functions

While not necessary for working with ABS data, the package includes some useful functions for manipulating ABS data:

Calculate percentage change over n_periods.

change_data = percentage_change(data, n_periods)

Annualise monthly or quarterly percentage rates.

annualised = annualise_percentages(data, periods_per_year)

Convert a pandas timeseries with a Quarterly PeriodIndex to an timeseries with a Monthly PeriodIndex.

monthly_data = qtly_to_monthly(
    quarterly_data, 
    interpolate, # default is True
    limit,  # default is 2, only used if interpolate is True
    dropna,  # default is True,
)

Convert monthly data to quarterly data by taking the mean or sum of the three months in each quarter. Ignore quarters with less than three months data. Drop NA items.

quarterly_data = monthly_to_qtly(
    monthly_data,
    q_ending,  # default is "DEC"
    f, # the function to apply ("sum" or "mean"), the default is "mean"
)

Notes:

  • This package does not manipulate the ABS data. The data is returned as it was downloaded. This includes any NA-only (ie. empty) columns where they occur.
  • This package only downloads timeseries data tables. Other data tables (for example, pivot tables) are ignored.
  • The index for all of the downloaded tables should be a pandas PeriodIndex, with an appropriately selected frequency.
  • In the process of data retrieval, ABS zip and excel files are downloaded and stored in a local cache. By default, the cache directory is "./.readabs_cache/". You can change the default directory name by setting the environemnt variable "READABS_CACHE_DIR" with the name of the preferred directory.
  • the "read" functions have a number of standard keyword arguments (with default settings as follows):
    • history="" - provide a month-year string to extract historical ABS data.
      For example, you can set history="dec-2023" to the get the ABS data for a catalogue identifier that was originally published in respect of Q4 of 2023. Note: not all ABS data sources are structured so that this technique works in every case; but most are.
    • verbose=False - Do not print detailed information on the data retrieval process. Setting this to true may help diagnose why something might be going wrong with the data retrieval process.
    • ignore_errors=False - Cease downloading when an error in encounted. However, sometimes the ABS website has malformed links, and changing this setting is necessitated. (Note: if you drop a message to the ABS, they will usually fix broken links with a business day).
    • get_zip=True - Download the excel files in .zip files.
    • get_excel_if_no_zip=True Only try to download .xlsx files if there are no zip files available to be downloaded.
    • get_excel=False - Do not automatically download .xlsx files. Note at least one of get_zip, get_excel_if_no_zip, or get_excel must be true. For most ABS catalogue items, it is sufficient to just download the one zip file. But note, some catalogue items do not have a zip file. Others have quite a number of zip files.
    • single_excel_only="" - if this argument is set to a table name (without the .xlsx extention), only that excel file will be downloaded. If set, and only a limited subset of available data is needed, this can speed up download times significantly. Note: overrides get_zip, get_excel_if_no_zip, get_excel and single_zip_only.
    • single_zip_only="" - if this argument is set to a zip file name (without the .zip extention), only that zip file will be downloaded. If set, and only a limited subset of available data is needed, this can speed up download times significantly. Note: overrides get_zip, get_excel_if_no_zip, and get_excel.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readabs-0.0.4.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

readabs-0.0.4-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file readabs-0.0.4.tar.gz.

File metadata

  • Download URL: readabs-0.0.4.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for readabs-0.0.4.tar.gz
Algorithm Hash digest
SHA256 82cd23087468e00263b0b92272c6846bb732249b6a21c1e1c742f0645aa52ea8
MD5 17074cc3f16b1147acc86fd98b425862
BLAKE2b-256 c6d229013e26ceceefa92187ef3a0d164e32364356c0d0e421f9c5be39aca906

See more details on using hashes here.

File details

Details for the file readabs-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: readabs-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 22.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for readabs-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e81ed5c3a58159c7a68943967d8ad2fe8f1dc85f2d372a1660e9b4c974e34619
MD5 de46a1c9d2931e77cd3501756a0ac487
BLAKE2b-256 8f99abfa05f410f4d3007a5086d870d3a7da2ac0664fd44b34171b06f18ecdb1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page