Skip to main content

Population table manipulation.

Project description

estime2

pipeline status coverage report

PyPI version shields.io PyPI status

This is a Python package to manipulate and make corrections on the end-of-period population of a given table based on the component method. The program aims to “distribute” values of components to other records so that no end-of-period population estimates are negative. Moreover, it incorporates sum constraints across regional levels, provincial and subprovincial, so that the total end-of-period population is the same as the original population table after it goes through the process.

Public version: https://gitlab.com/joon3216/estime2 (private repository)
StatCan version: https://f3eaipitcap01.statcan.ca/junkpar/estime2 (not available to public)

Refer to documentations for details.

Installation

In the command line, simply type:

pip install estime2

To update to the latest version, type:

pip install estime2 --upgrade

To install from source, first download the whole repository using a proper git clone command. Then, move your working directory to that repository, and type:

python setup.py install --user

Example

Suppose tbl is a pandas.DataFrame that qualifies to become a estime2.ProvPopTable. Creating an instance of ProvPopTable is done as follows:

import estime2
poptbl = estime2.ProvPopTable(tbl)
print(poptbl)
#>      Sex   Age  Initial Population  BTH  ...  NPR, 2019-07-01  IMM  IIM  RAI
#> 0      1    -1                   0  473  ...                0    0    5    2
#> 1      1     0                 455    0  ...                0    0   12    2
#> 2      1     1                 449    0  ...                0    0   10    2
#> 3      1     2                 446    0  ...                0    0   10    2
#> 4      1     3                 435    0  ...                0    0   11    2
#> ..   ...   ...                 ...  ...  ...              ...  ...  ...  ...
#> 97     1    96                   0    0  ...                0    0    0    1
#> 98     1    97                   0    0  ...                0    0    0    2
#> 99     1    98                   1    0  ...                0    0    0    2
#> 100    1    99                   0    0  ...                0    0    0    2
#> 101    1  100+                   1    0  ...                0    0    0    2
#> 
#> [102 rows x 15 columns]

See the source code for more information about the arguments of ProvPopTable.

ProvPopTable.calculate_pop() is the method that computes the end-of-period population:

calculated_poptbl = poptbl.calculate_pop()
print(calculated_poptbl)
#>      Sex   Age  Postcensal Population
#> 0      1     0                    461
#> 1      1     1                    449
#> 2      1     2                    446
#> 3      1     3                    442
#> 4      1     4                    435
#> ..   ...   ...                    ...
#> 96     1    96                      1
#> 97     1    97                     -4
#> 98     1    98                      1
#> 99     1    99                      2
#> 100    1  100+                      2
#> 
#> [101 rows x 3 columns]

Note that the total end-of-period population of poptbl before applying the corrections is:

print(calculated_poptbl[estime2.options.pop.end].sum())
#> 20023

estime2.options has many global options available for the package to work. See the source codes for details.

ProvPopTable.fix_issues() returns the fixed version of the original ProvPopTable where there are no negative end-of-period population(s):

poptbl_fixed_tbl = poptbl.fix_issues()
print(poptbl_fixed_tbl)
#>      Sex   Age  Initial Population  BTH  ...  NPR, 2019-07-01  IMM  IIM  RAI
#> 0      1    -1                   0  473  ...                0    0    5    2
#> 1      1     0                 455    0  ...                0    0   12    2
#> 2      1     1                 449    0  ...                0    0   10    2
#> 3      1     2                 446    0  ...                0    0   10    2
#> 4      1     3                 435    0  ...                0    0   11    2
#> ..   ...   ...                 ...  ...  ...              ...  ...  ...  ...
#> 97     1    96                   0    0  ...                0    0    0    1
#> 98     1    97                   0    0  ...                0    0    0    2
#> 99     1    98                   1    0  ...                0    0    0    2
#> 100    1    99                   0    0  ...                0    0    0    2
#> 101    1  100+                   1    0  ...                0    0    0    2
#> 
#> [102 rows x 15 columns]

Any negative end-of-period is brought up to 0, and the counter-modifications are applied to records of neighbouring ages:

calculated_poptbl_fixed = poptbl_fixed_tbl.calculate_pop()
print(calculated_poptbl_fixed)
#>      Sex   Age  Postcensal Population
#> 0      1     0                    461
#> 1      1     1                    449
#> 2      1     2                    446
#> 3      1     3                    442
#> 4      1     4                    435
#> ..   ...   ...                    ...
#> 96     1    96                      1
#> 97     1    97                      0
#> 98     1    98                      1
#> 99     1    99                      2
#> 100    1  100+                      2
#> 
#> [101 rows x 3 columns]

ProvPopTable.fix_issues() preserves the total end-of-period population of the original table:

print(calculated_poptbl_fixed[estime2.options.pop.end].sum())
#> 20023

If you let return_all_mods to be True in ProvPopTable.fix_issues(), you get the wrapper object which allows you to compute relevant metrics:

poptbl_fixed = poptbl.fix_issues(return_all_mods = True)

For example, you may compute the standard deviation of all the corrections applied to poptbl as follows:

poptbl_sd = poptbl_fixed.get_metric_sd()
print(poptbl_sd)
#>    Sex Age Component        sd
#> 0    1  97       DTH  2.236068

The wrapper object also comes with some visualization tools. For example, you can visualize pre- and post-correction end-of-period populations as follows:

poptbl_fixed.plot_pop(age_range = [87, 97])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

estime2-0.1.2.tar.gz (59.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page