A Python script which takes the census data pack from the ABS and instantly merges data across the many separate files. Allows you to specify files, fields and spatial aggregation against customizable settings in a config csv file.
Project description
CensusWrangler
About
CensusWrangler is the fast python API to your local copy of the Australian Census data.
The use-case
If you have any interest or career involving Australian data, you've likely had to deal with the challenging Census data structures.
In the government's interest of maintaining privacy for all involved, they provide data as a series of hundreds of tables each at different levels of geographic aggregation.
My experience has been that getting the right slice of data out this structure can be tedious and time-consuming. It drags out what would otherwise be a quick piece of adhoc analysis.
If you're lucky - your organisation already has a paid API, or database of the census data. But if not?
You can speed up the process with the CensusWrangler library. With quick & templatable configurations it helps you efficiently pull data out of the downloadable census datapacks.
Features
- Configuration templates: Deploy and quickly customise configuration csvs - then let CensusWrangler instally find and merge the data
- Re-use configs across geographies: Change a single argument to re-pull data from a different geography without any additional work
- Validation & Checks: Your config is checked against census metadata, with detailed errors letting you know what went wrong
- Built-in Grouping: Make easier for yourself down the line by setting groups and your own column naming directly in the config file
- Customisable output: Select from several convenient output methods, accessing the output quickly in your desired format
- On you local machine: Once you have downloaded the datapack, the library requires no access to anything other than what is on your local machine
Set-up
Download a census datapack
- Visit the the ABS Census
DataPacks
page & download a datapack
.zip - Extract the
.zipinto a single folder, containing nothing else
When you are ready, your datapack folder should look something like this, with the name of the downloaded pack:
/2021_GCP_all_for_AUS_short-header
├── 2021 Census GCP All Geographies for AUS/
├── Metadata/
├── Readme/
CensusWrangler is fully tested on the 2021 census datapacks, and basic testing shows it working with previous years as well.
Install
pip install censuswrangler
Import
import censuswrangler as cw
Usage
Preparing a Config Template
We tell CensusWrangler what data to grab by preparing a config csv:
Generate a config template
You can quickly generate a template to get started:
cw.create_config_template("C:\Config Folder\")
Config fields
Below is an sample example of the config csv`
The first 3 fields come straight from the metadata:
SHORT- Short descriptorLONG- Long descriptorDATAPACKFILE- A code indicating which datapack file contains the field
The other 2 are used to customise, group & simplify the names of fields in the CensusWrangler output:
CUSTOM_DESCRIPTION- Describes the data subset represented, Like 'Male' & 'Female'CUSTOM_GROUP- Describes the subsets, like 'Gender'
Put whatever you need in the custom fields, just make sure that each row is unique.
| SHORT | LONG | DATAPACKFILE | CUSTOM_DESCRIPTION | CUSTOM_GROUP |
|---|---|---|---|---|
| Tot_P_M | Total_Persons_Males | G01 | Male | Gender |
| Tot_P_F | Total_Persons_Females | G01 | Female | Gender |
| P_Tot_Marrd_reg_marrge | PERSONS_Total_Married_in_a_registered_marriage | G06 | Married | Relationship Type |
| P_Tot_Married_de_facto | PERSONS_Total_Married_in_a_de_facto_marriage | G06 | Couple | Relationship Type |
| P_Tot_Not_married | PERSONS_Total_Not_married | G06 | No relationship | Relationship Type |
Referencing Metadata
It's super easy to copy what you want out of the metadata file that comes with each datapack.
- In the datapack folder, look in the
/Metadata/folder for a file likeMetadata_2021_GCP_DataPack_R1_R2.xlsx - Go to the
Cell Descriptors Informationsheet - Browse the fields, and copy over the
SHORT,LONG&DATAPACKFILEcolumns for your fields you want, into the config file - Fill in the custom fields in the remaining columns of the config file
This file is also used to validate your config selections, so try & avoiding changing it as you go.
Select a Census Geography
Visit the ABS Census Geography Glossary.
Determine the shortcode for the geography you are after. For example, 'Statistical Area Level 1' has code 'SA1'.
This is also reflected by the folder names in the datapack - look for
the name like \2021 Census GCP All Geographies for AUS\.
Usage
Once the config file is ready, you can run CensusWrangler with just a few lines of code:
import censuswrangler as cw
# Intialise the Census object
census = cw.Census(
datapack_path=r"E:/Data/2021_GCP_all_for_AUS_short-header/", # Datapack folder path
config_path=r"censuswrangler/config_template.csv", # Config file path
geo_type="LGA", # The geotype code to pull the data for
year=2021, # The census year
)
# Gather and prepare the data from the datapack
census.wrangle("all") # "merge" | "pivot" | "all"
# Access the output dataframes in the desired format
print(census.merged_df)
print(census.pivoted_df)
# Or output directly to csv
census.to_csv(
"all", # "merge" | "pivot" | "all"
r"F:/Github/censuswrangler/test_output", # Output folder
)
More details are available in the documentation.
Example Output
You can see example output over in the repository's sample folder.
Good luck - and don't forget to give the repository a star if this helped you out (it all helps!).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file censuswrangler-1.0.0.tar.gz.
File metadata
- Download URL: censuswrangler-1.0.0.tar.gz
- Upload date:
- Size: 14.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.2 Windows/11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6ddca86e29404c392d0c0f56143b1a22fde139f324f8c6d47dbd1e948a4f313
|
|
| MD5 |
47899d652e3b603cd200a80c5cfe062a
|
|
| BLAKE2b-256 |
bcfa37ed63899376faa7990d3af710e1c81f3a4504f479ebc1fd825c1b3aa4e0
|
File details
Details for the file censuswrangler-1.0.0-py3-none-any.whl.
File metadata
- Download URL: censuswrangler-1.0.0-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.2 Windows/11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fba1614811553d880bafc865fa82388ac9f5047710d0ba919bfc717a21335bcc
|
|
| MD5 |
5078ead4433587b0691e2afa41599aa5
|
|
| BLAKE2b-256 |
01250446ff9ea5268cc14ea52932ecd74be83b8d15a71eee8c31eb0f07a84d6a
|