Skip to main content

Tool for reading ABS TableBuilder datasets easily in Python.

Project description

Table Builder IO

This tool provides a standard way to read/ clean data extracted from ABS TableBuilder table builder. It serves to avoid/ replace bespoke ways of preparing table builder data e.g.

  • Cleaning the header and footer data manually
  • Trying to be clever with magic arguments to pandas read_csv skipheader and skipfooter that may or may not need to be adjusted every time
  • Realising your magic arguments to skipheader and skipfooter are only part of the problem when you have defined wafers and resort to manually cleaning CSVs
  • Hacky flattening of row level index labels and column labels into a single set of column headers that definitely works every time

table_builder_io defines a minimal API for reading CSVs downloaded from ABS TableBuilder without manual editing of the raw data.

Installation

placeholder

Dependencies

Besides python itself, the only dependency for table_builder_io is pandas. It has been tested on 1.3.x but does not use any special functionality, so will likely work on any release >=1.0.0. Hope to test this more explicitly in future.

table_builder_io requires python >=3.6 as it uses f-strings and standard library type hints. It has been explicitly tested on Python 3.8

Example

Lets say you have a table builder file that looks something like this

Australian Bureau of Statistics

"2016 Census - Counting Persons, Place of Enumeration (MB)"
"SEXP Sex and FMGF - 1 Digit Level by STATE"
"Counting: Persons Location on Census Night"

Filters:
"Default Summation","Persons Location on Census Night"

,"STATE","New South Wales","Victoria","Queensland","South Australia","Western Australia","Tasmania","Northern Territory","Australian Capital Territory","Other Territories","Total",
"SEXP Sex","FMGF - 1 Digit Level",
"Male","Couple family with grandchildren",20710,12307,14166,4066,7151,1435,1926,702,10,62463,
,"Lone grandparent",10617,6127,6351,2085,3369,671,1486,302,13,31019,
,"Not applicable",3692904,2892405,2362975,817562,1250871,244515,132578,196526,2853,11593188,
"Female","Couple family with grandchildren",19712,11688,13790,3723,7000,1364,1820,723,10,59827,
,"Lone grandparent",15730,9441,9534,3135,5042,961,1827,462,13,46152,
,"Not applicable",3805273,3014087,2437722,844224,1244420,255233,119476,201935,2410,11924766,
"Total","Couple family with grandchildren",40422,23999,27950,7780,14154,2793,3742,1423,21,122290,
,"Lone grandparent",26351,15572,15892,5219,8409,1629,3317,761,27,77165,
,"Not applicable",7498170,5906487,4800703,1661786,2495294,499744,252053,398458,5265,23517955,

"Data Source: Census of Population and Housing, 2016, TableBuilder"

"INFO","Cells in this table have been randomly adjusted to avoid the release of confidential data. No reliance should be placed on small cells."


"Copyright Commonwealth of Australia, 2018, see abs.gov.au/copyright"
"ABS data licensed under Creative Commons, see abs.gov.au/ccby"

table_builder_io (for now) defines a single public class TableBuilderReader which is used like so

In[1]: from table_builder_io import TableBuilderReader
In[2]: reader = TableBuilderReader.from_file("test/mini_testfile.csv")
In[3]: df = reader.read(as_index=True)
In[4]: df.iloc[:, :4].head()
OOut[4]:
STATE                                      New South Wales  Victoria  Queensland  South Australia
SEXP Sex FMGF - 1 Digit Level
Male     Couple family with grandchildren            20710     12307       14166             4066
         Lone grandparent                            10617      6127        6351             2085
         Not applicable                            3692904   2892405     2362975           817562
Female   Couple family with grandchildren            19712     11688       13790             3723
         Lone grandparent                            15730      9441        9534             3135
# Or alternatively as a flat dataframe
In[5]: df2 = reader.read(as_index=False)
In[6]: df2.iloc[:, :6].head()
Out[6]:
  SEXP Sex              FMGF - 1 Digit Level  New South Wales  Victoria  Queensland  South Australia
0     Male  Couple family with grandchildren            20710     12307       14166             4066
1     Male                  Lone grandparent            10617      6127        6351             2085
2     Male                    Not applicable          3692904   2892405     2362975           817562
3   Female  Couple family with grandchildren            19712     11688       13790             3723
4   Female                  Lone grandparent            15730      9441        9534             3135

For more examples, see examples.ipynb

Supported Formats

Currently should support

  • CSVs with multilevel / hierarchical row headers (as in the example above)
  • CSVs with multilevel / hierarchical row headers (e.g. the transpose of the above data)
  • Wafers: TableBuilderReader.read returns a Dict[str, pd.DataFrame] where the keys are the wafer names if wafers are found
  • Currently only intending to support CSV format from Table Builder

In theory easy to add

  • support for NVS TableBuilder headers/ footers
  • extraction of header/ footer metadata in a retrievable way
  • Standard utils after loading the table into memory

Performance

  • Not a super optimised implementation, need to be able to read everything into memory

  • File is scanned twice - once to look for header/ footer/ wafers and then to read the csvs

  • First scan is python, second scan is pandas csv reader (c engine)

  • So maybe not the best if you have data sizes near the cell limit

  • Internals are still messy because I haven't cleaned them up yet, waiting since I expect stuff to break

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

table_builder_io-0.0.1.tar.gz (55.3 kB view details)

Uploaded Source

Built Distribution

table_builder_io-0.0.1-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file table_builder_io-0.0.1.tar.gz.

File metadata

  • Download URL: table_builder_io-0.0.1.tar.gz
  • Upload date:
  • Size: 55.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.25.1

File hashes

Hashes for table_builder_io-0.0.1.tar.gz
Algorithm Hash digest
SHA256 2d17f0086bf7d3201960c51076f3e74a92f0b9fba2b953aadafa100a78cedaba
MD5 400505992c4f78b83ded31820e084b24
BLAKE2b-256 e7a0273573775f7e2bc1bc2eaf42380ba7fc8b35bf5e4a1027290b73b31c7635

See more details on using hashes here.

File details

Details for the file table_builder_io-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for table_builder_io-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dd9a072e4f5594b50cc229107cd675f81fd721f5d9ef396d54a0c7473faae0b0
MD5 169012ddc13c549bab5d5f50d63edad7
BLAKE2b-256 6a9385ad0fcb55327b92a68c276b1d2e0182f93cf3e376af6d928119e7e79a3c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page