Skip to main content

Get ABS timeseries data in pandas DataFrames

Project description

readabs

A Python package for downloading and working with timeseries data from the Australian Bureau of Statistics (ABS) and Reserve Bank of Australia (RBA).

Overview

readabs automates the retrieval of ABS and RBA Excel spreadsheets from their websites, caches them locally, and provides a clean pandas DataFrame interface for analysis. Instead of manually downloading spreadsheets, navigating complex Excel files, and writing parsing code, readabs handles all of this automatically.

The ABS publishes timeseries data as Excel files with a specific structure: "Data" sheets contain the actual values, while "Index" sheets contain metadata describing each series. readabs parses both, giving you clean DataFrames with proper time indices and full metadata for every series.

Key Features

  • Automatic downloading: Fetches Excel/ZIP files directly from ABS and RBA websites
  • Smart caching: Caches downloaded files locally and only re-downloads when data is updated
  • Clean DataFrame output: Returns timeseries as pandas DataFrames with proper PeriodIndex
  • Metadata preservation: Retains full ABS/RBA metadata (series descriptions, units, frequency)
  • Flexible retrieval: Get entire catalogues, specific series by ID, or search by description
  • Time series utilities: Built-in functions for frequency conversion, percentage changes, and unit scaling

Installation

pip install readabs

Or using uv:

uv add readabs

Quick Start

import readabs as ra
from readabs import metacol as mc  # ABS metadata column names

# Download the complete Labour Force Survey (ABS 6202.0)
data, meta = ra.read_abs_cat("6202.0")

# data is a dict of DataFrames (one per table)
# meta is a DataFrame containing all series metadata

# Access a specific table
labour_force = data["6202001"]
print(labour_force.head())

Usage Examples

Browse Available Data

# List all available ABS catalogues
ra.print_abs_catalogue()

# List all RBA tables
ra.print_rba_catalogue()

Note: The ABS catalogue includes discontinued series marked as "CEASED". These may still be accessible using the url parameter (see "Historical and Archived Data" below).

Get Specific Series by ID

# Get unemployment rate series by its Series ID
unemployment, meta = ra.read_abs_series(
    cat="6202.0",
    series_id="A84423050A"
)

Search for Data by Description

# Find and retrieve series by searching metadata
search_terms = {
    "Unemployment rate": mc.did,       # Data Item Description
    "Persons": mc.did,
    "Seasonally Adjusted": mc.stype,   # Series Type
}
results = ra.search_abs_meta(meta, search_terms)

# Or retrieve directly using read_abs_by_desc()
wanted = {
    "Unemployment Rate": {
        "cat": "6202.0",
        "did": "Unemployment rate ;  Persons ;",
        "stype": "Seasonally Adjusted",
    },
}
series_dict, meta = ra.read_abs_by_desc(wanted)

RBA Data

# Get the Official Cash Rate
ocr = ra.read_rba_ocr(monthly=True)

# Read any RBA table
rba_data, rba_meta = ra.read_rba_table("A1")

# Historical RBA tables are prefixed with "Z:"
hist_data, hist_meta = ra.read_rba_table("Z:A1")

Use print_rba_catalogue() to see all available tables, including historical ones.

Historical and Archived Data

For older data no longer in the current ABS catalogue, you can work with local ZIP files or specific URLs:

# Parse a previously downloaded ABS ZIP file
data_dict = ra.grab_abs_zip("/path/to/downloaded/abs_data.zip")

# Fetch data from a specific ABS URL (useful for archived pages)
data_dict = ra.grab_abs_url(url="https://www.abs.gov.au/some/archived/page")

# Access historical releases using the history parameter
data, meta = ra.read_abs_cat("6202.0", history="dec-2023")

# Or use a direct URL with read_abs_cat
data, meta = ra.read_abs_cat(url="https://www.abs.gov.au/statistics/...")

These functions return a dictionary of DataFrames (one per Excel sheet), allowing you to work with data that may have been removed from the main ABS catalogue.

Advanced Options

The read_abs_cat() function accepts several optional parameters for fine-tuning:

data, meta = ra.read_abs_cat(
    "6202.0",
    single_excel_only="6202001",  # Only download one specific table (faster)
    cache_only=True,              # Use cached data only (offline mode)
    verbose=True,                 # Print diagnostic messages
    ignore_errors=True,           # Continue if some files fail to download
    keep_non_ts=True,             # Include non-timeseries tables in output
)

# Or download a chosen subset of tables (skips the full-catalogue zip):
data, meta = ra.read_abs_cat(
    "6202.0",
    selected_excel=("62020001", "62020017", "62020X28"),
)
Parameter Description
single_excel_only Download only the specified Excel file (e.g., "6202001")
selected_excel Tuple of Excel file names to download (e.g., ("62020001", "62020017")). Must be a tuple, not a list.
single_zip_only Download only the specified ZIP file
cache_only Only use locally cached files, don't download
verbose Print progress and diagnostic information
ignore_errors Continue processing if some downloads fail
keep_non_ts Include non-timeseries tables in the output

Time Series Utilities

# Calculate percentage change
annual_growth = ra.percent_change(quarterly_data, n_periods=4)

# Convert quarterly to monthly (with interpolation)
monthly = ra.qtly_to_monthly(quarterly_data, interpolate=True)

# Convert monthly to quarterly
quarterly = ra.monthly_to_qtly(monthly_data, q_ending="DEC", f="mean")

# Scale large numbers and adjust unit labels
scaled_data, new_units = ra.recalibrate(data, "Number")
# e.g., 1,500,000 "Number" becomes 1.5 "Million"

API Reference

ABS Functions

Function Description
read_abs_cat(cat) Download complete ABS catalogue as dict of DataFrames + metadata
read_abs_series(cat, series_id) Get specific series by Series ID
read_abs_by_desc(wanted) Get series by searching descriptions
abs_catalogue() Get DataFrame of all ABS catalogue numbers
print_abs_catalogue() Print formatted table of ABS catalogues
search_abs_meta(meta, terms) Search metadata for matching series
find_abs_id(meta, terms) Find unique series matching search terms
grab_abs_url(url) Fetch data from a specific ABS URL
grab_abs_zip(zip_path) Parse a local ABS ZIP file

RBA Functions

Function Description
read_rba_table(table) Read RBA table, returns data + metadata
read_rba_ocr(monthly=True) Get Official Cash Rate as Series
rba_catalogue() Get DataFrame of RBA table numbers
print_rba_catalogue() Print formatted table of RBA catalogues

Utility Functions

Function Description
percent_change(data, n) Calculate percentage change over n periods
annualise_rates(data, periods) Convert rates to annualized values
annualise_percentages(data, periods) Convert percentages to annualized values
qtly_to_monthly(data) Convert quarterly to monthly frequency
monthly_to_qtly(data) Convert monthly to quarterly frequency
recalibrate(data, units) Scale values and adjust unit labels

Metadata Constants

from readabs import metacol as mc   # ABS metadata columns
from readabs import rba_metacol as rm  # RBA metadata columns

# ABS metadata columns include:
# mc.did   - Data Item Description
# mc.id    - Series ID
# mc.unit  - Unit (e.g., "Percent", "Number")
# mc.freq  - Frequency
# mc.stype - Series Type (e.g., "Seasonally Adjusted")
# mc.table - Table name

Caching

Downloaded files are cached locally to avoid repeated downloads. The cache location can be configured:

# Default: ./.readabs_cache/ in the current directory
# Override with environment variable:
import os
os.environ["READABS_CACHE_DIR"] = "/path/to/cache"

The cache respects HTTP Last-Modified headers, so data is only re-downloaded when the source files have been updated.

Return Types

Most ABS functions return a tuple:

  • read_abs_cat(): tuple[dict[str, DataFrame], DataFrame] - dict of data tables + metadata
  • read_abs_series(): tuple[DataFrame, DataFrame] - data + metadata
  • read_abs_by_desc(): tuple[dict[str, Series], DataFrame] - named series + metadata

DataFrames use pandas PeriodIndex with appropriate frequency (Monthly, Quarterly, Yearly).

Documentation

Full API documentation is available in the ./docs directory. Generate updated documentation with:

pdoc ./src/readabs -o ./docs

Or view the generated HTML documentation in your browser.

Requirements

  • Python 3.11+
  • pandas, numpy, requests, beautifulsoup4, lxml, openpyxl, pyxlsb

License

This project is open source. See the repository for license details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readabs-0.1.9.tar.gz (6.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

readabs-0.1.9-py3-none-any.whl (6.1 MB view details)

Uploaded Python 3

File details

Details for the file readabs-0.1.9.tar.gz.

File metadata

  • Download URL: readabs-0.1.9.tar.gz
  • Upload date:
  • Size: 6.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for readabs-0.1.9.tar.gz
Algorithm Hash digest
SHA256 dc082852ff5b9576d5673544726de4e2c16cb4371e8c76d5dfa2bd9b74792a53
MD5 a7e88db246c24e3c1eb4bac94780a8a7
BLAKE2b-256 09110f0cfc64a8700c18477ce5963b475839529ffc23ba795c8df08adaa96b64

See more details on using hashes here.

File details

Details for the file readabs-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: readabs-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 6.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for readabs-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 536ff054d5235a2fe2fd44d88439a50ecce53bbe09ffab05b567933dc34a990a
MD5 174df86bd74c0ad499ac354ed7acec9a
BLAKE2b-256 e0f37c41f0917192a87a9f280005c6e0ca366b0ebb6a3669f0073bc2758a9b6e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page