Skip to main content

Download CSO Ireland datasets and data catalogue as Pandas dataframes.

Project description

CSO Ireland Data

==============================

Easily download data from the CSO PxStat API as Pandas datasets.

Uses requests-cache for super fast access to cached requests and easy persistence with multiple storage backends.

Installation

To install, just use pip:

pip install cso-ireland-data

Usage

Getting started

First, set up a CSODataSession.

  • By default, this is really simple.

    from cso_ireland_data import CSODataSession
    cso = CSODataSession()
    
  • If you want to add caching, no problem! All the functionality of the requests-cache package is available through cached_session_params.

    from datetime import timedelta
    from cso_ireland_data import CSODataSession
    
    cso = CSODataSession(
        cached_session_params={
            "use_cache_dir": True,  # Save files in the default user cache dir
            "cache_control": True,  # Use Cache-Control response headers for expiration, if available
            "expire_after": timedelta(days=1),  # Otherwise expire responses after one day
        }
    )
    
  • Stuck behind a corporate firewall that causes SSL certificate issues? Also no problem! All the functionality of the requests get() method is available through request_params.

    from cso_ireland+data import CSODataSession
    
    # Tell requests.get() it's ok not to verify SSL certificates when getting data.
    # !!! Only do this if you're absolutely sure it's what you need !!!
    cso = CSODataSession(request_params={"verify": False})
    

Getting the data catalogue

To get a catalogue (Table of Contents) of all the datasets that are available through the API, use get_toc().

NB Requests for the ToC sometimes time out on the CSO API. IF this happens, try again!

cso.get_toc()
table_id table_name last_updated copyright exceptional frequency earliest latest variables
A0101 1996 Population and Percentage Change 1991 and 1996 2020-05-01 11:00:00+00:00 Central Statistics Office, Ireland False CensusYear 1996 1996 ['Province County or City']
A0102 Population at Each Census Since 1841 2020-05-01 11:00:00+00:00 Central Statistics Office, Ireland False CensusYear 1841 1996 ['Province or County', 'Sex']
A0103 Population 2020-05-01 11:00:00+00:00 Central Statistics Office, Ireland False CensusYear 1996 1996 ['Province County or City', 'Sex', 'Aggregate Town or Rural Area']
A0104 Population 2020-06-03 11:00:00+00:00 Central Statistics Office, Ireland False CensusYear 1996 1996 ['Sex', 'Regional Authority']
A0105 1996 Population and Percentage Change 1996 and 2002 2021-07-19 11:00:00+00:00 Central Statistics Office, Ireland False CensusYear 1996 1996 ['Towns by Electoral Division']

Getting a table using its ID code

To get the whole contents of a particular table hosted on the Statbank API, use get_table().

You just need to know the ID code of the table, which you can look up using get_toc().

wpm29 = cso.get_table("WPM29")
wpm29.head()
Wholesale Price Index (Excl VAT) for Energy Products
('Autodiesel', '2015M01') 96.7
('Autodiesel', '2015M02') 102
('Autodiesel', '2015M03') 103
('Autodiesel', '2015M04') 102.9
('Autodiesel', '2015M05') 104.6

Getting some common tables quickly

The CSODataSession class includes some useful methods to get data from commonly accessed tables quickly.

Monthly Consumer Price Index (CPI)

By default, the monthly_cpi() method returns a single column corresponding to the 'All items' headline CPI in the source table.

Also by default, this index is re-normalized to the most recent month - you can toggle this by setting normalize_to_most_recent to False.

simple_cpi = cso.monthly_cpi()
simple_cpi.tail()
Month All items
2022-04-01 00:00:00 0.9725
2022-05-01 00:00:00 0.981
2022-06-01 00:00:00 0.9937
2022-07-01 00:00:00 0.9986
2022-08-01 00:00:00 1

It's also possible to pass a list of commodity groups:

commmodity_group_cpi = cso.monthly_cpi(
    commodity_groups=[
        "All items",
        "Alcoholic beverages and tobacco",
        "Health",
        "Recreation and culture",
    ]
)
commodity_group_cpi.tail()
Month All items Alcoholic beverages and tobacco Health Recreation and culture
2022-04-01 00:00:00 0.9725 0.9738 0.9828 0.9945
2022-05-01 00:00:00 0.981 0.9937 0.9851 0.9954
2022-06-01 00:00:00 0.9937 0.9958 0.9874 0.9973
2022-07-01 00:00:00 0.9986 0.9969 0.9874 0.9991
2022-08-01 00:00:00 1 1 1 1

Live Register

Use the live_register() method to get Live Register numbers (optionally broken down by Age Group and Sex) by month. This is a long data series, starting in April 1967 and still continuing every month, so it may be convenient to specify a start and/or end date for the data returned.

The Live Register data series is based on a monthly point-in-time count of people who have active Jobseeker claims with the Department of Social Protection (DSP), and these counts are extracted from DSP's administrative computer systems on a particular day every month.

Because of this, live_register() returns three possibly useful dates for each month:

  1. 'Month' is the index of the data frame. It's just the last day of each calendar month.
  2. 'reference_date' is the date of the point-in-time count of people with active Jobseeker claims. It's the last Friday of each month before May 2015, and the last Thursday of the month from then on.
  3. 'extract_date' is the date on which the source administrative data was actually extracted - it's always the Sunday after the reporting_date.
live_register = cso.live_register(start=datetime(2010, 1, 1))
Month Age Group Sex Persons on the Live Register Persons on the Live Register (Seasonally Adjusted) reference_date extract_date
516 2010-04-30 00:00:00 All ages Both sexes 432657 440800 2010-04-30 00:00:00 2010-05-02 00:00:00
517 2010-08-31 00:00:00 All ages Both sexes 466923 444000 2010-08-27 00:00:00 2010-08-29 00:00:00
518 2010-12-31 00:00:00 All ages Both sexes 437079 446000 2010-12-31 00:00:00 2011-01-02 00:00:00
519 2010-02-28 00:00:00 All ages Both sexes 436956 439000 2010-02-26 00:00:00 2010-02-28 00:00:00
520 2010-01-31 00:00:00 All ages Both sexes 436936 439400 2010-01-29 00:00:00 2010-01-31 00:00:00

Life Tables

The life_table() method by default returns a complete life table for the most recent source data vintage.

life_table = cso.life_table()
life_table.head()
Ix dx px qx Lx Tx e0x
('Male', 101) 851 616 0.724494 0.275506 543 1544 1.82
('Male', 102) 616 440 0.714258 0.285742 396 1002 1.63
('Male', 103) 440 307 0.69799 0.30201 287 605 1.38
('Male', 104) 307 221 0.71911 0.28089 197 319 1.04
('Male', 105) 221 198 0.896101 0.103899 122 122 0.55

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cso_ireland_data-0.0.3.tar.gz (112.3 kB view hashes)

Uploaded Source

Built Distribution

cso_ireland_data-0.0.3-py3-none-any.whl (10.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page