An implementation of Anderson (2008) inverse covariance weighted index for Python, validated against STATA's Swindex package.
Project description
Inverse-Covariance Weighted Index for Python
A Python implementation of the Inverse-Covariance Weighted (ICW) Index introduced by Anderson (2008) and implemented in Stata's swindex by Schwab et al. (2020). I validated this against Stata's swindex and produces effectively identical results.
Quick Start
import numpy as np
import pandas as pd
from icw import icw_index # or just copy-paste this function
# Example using numpy arrays
x1 = np.random.rand(100)
x2 = np.random.rand(100)
index = icw_index([x1, x2])
# Example using Pandas dataframes
df = pd.DataFrame({'var1': np.random.rand(100),
'var2': np.random.rand(100),
'treat': np.random.randint(0, 2, size=100)})
# Full sample normalization, no reference group. Entire index is distributed M=0, SD=1
df['icw'] = icw_index([df['var1'].values, df['var2'].values])
# User-specified reference group normalization. Control group is distributed M=0, SD=1
# and treatment group is in effect size units relative to control group.
ref_mask = (df['treat'] == 0).values
df['icw_control_reference'] = icw_index([df['var1'].values, df['var2'].values],
reference_mask=ref_mask)
What is the ICW Index?
Tl;DR: The ICW index is a weighted average of variables where the weights are determined by the inverse of the covariance matrix of the variables.
Anderson (2008) proposed an index to combine multiple outcomes into a single measure using the inverse of the covariance matrix as weights. Why would you do this? Well first, people use indices all the time to avoid a multiple comparison problem. But usually, you would average the index variables so each counts equally. This can be sub-optimal if a bunch of variables all correlate with each other. You may want to up-weight the ones that are providing unique information. So the ICW index down-weights correlated outcomes and up-weights less correlated ones. Also, using the inverse covariance matrix as weights minimizes the variance of the resulting index.
Implementation Details
The implementation follows the procedure explained by Schwab et al. (2020). I'll quote their steps here for clarity...
We can calculate the standardized weighted index $\tilde{s}$ for each observation $i$ as follows:
-
Select $k$ indicators relevant for outcome $j$.
-
Adjust sign: For all $k$ indicators, ensure the positive direction always indicates a "better outcome".
-
Normalize indicators: Demean all $k$ indicators by subtracting the mean of the indicator in the reference group (the full sample is the default reference group). Then, convert them to effect sizes, $\tilde{y}_k$, by dividing each indicator by its reference group standard deviation.
-
Construct weights: Create weights using $\Sigma^{-1}$, the inverse of the covariance matrix of the normalized indicators. Specifically, set the weight $\tilde{w}_k$ on each indicator equal to the sum of its row entries in $\Sigma^{-1}$. With this rule, highly correlated indicators are assigned small or offsetting weights, while less correlated outcomes receive larger weights.
-
Construct index: Calculate the weighted average of $\tilde{y}_k$ for observation $i$. Formally, the weighted average $\overline{s}_i$ is calculated using $\tilde{s}_i = (1'\Sigma^{-1}1)^{-1}(1'\Sigma^{-1}\tilde{y}_i)$, where $\mathbf{1}$ is a column vector of 1s and $\tilde{y}_i$ is a column vector of all outcomes for observation $i$. This is an efficient GLS estimator.
-
Normalize index: Demean index $\overline{s}_i$ by subtracting the mean of the index in the reference group, and convert it to effect sizes by dividing it by its reference group standard deviation. This normalization results in an index distributed with mean zero and standard deviation one in the reference group.
Validation
I validated this implementation against Stata's swindex (version 14) using 100 synthetic datasets:
- Datasets: 100 datasets with 5 variables each
- Sample sizes: 500-2000 observations per dataset
- Total observations: 122,444
- Variables: Standard normal distribution, no missing data
Results
Results are identical (within a floating point tolerance) to Stata's swindex implementation. Here are the two
options I tested.
- Default settings (full sample as reference group)
- Correlation: 0.999999999999996
- Differences > 1e-06: 0
- Max absolute difference: 3.08e-07
- Median absolute difference: 3.01e-08
- Mean absolute difference: 3.88e-08
- User-specified reference group (using the control group as reference)
- Correlation: 1.000000000000000
- Differences > 1e-06: 0
- Max absolute difference: 3.31e-07
- Median absolute difference: 2.94e-08
- Mean absolute difference: 3.77e-08
Limitations
This implementation is simpler than swindex and has the following restrictions:
- No missing data: Input arrays must not contain NaN values
- User handles sign orientation: Assumes input data is already oriented so higher values indicate better outcomes
- Report bugs: I imagine I missed some edge cases. Feel free to report bugs.
System I Ran Tests On
I was using Python 3.13, dev_requirements.txt packages, MacOS, and Stata 19.5 for testing.
References
- Schwab, B., Janzen, S., Magnan, N. P., & Thompson, W. M. (2020). Constructing a summary index using the standardized inverse-covariance weighted average of indicators. The Stata Journal, 20(4), 952-964.
- Anderson, M. L. (2008). Multiple Inference and Gender Differences in the Effects of Early Intervention: A Reevaluation of the Abecedarian, Perry Preschool, and Early Training Projects. Journal of the American Statistical Association, 103(484), 1481–1495.
Citation
If you use this implementation in your work, please cite:
@misc{icw_index,
author = {Joshua Ashkinaze},
title = {Inverse-Covariance Weighted Index for Python},
year = {2025},
url = {https://github.com/josh-ashkinaze/inverse-covariance-weighted-index}
}
Issues
Please open an issue if you find any bugs or edge cases.
ToDos
- Add option for user-specified reference group as in Schwab et al. (2020) [DONE]
- Add handling for missing data as in Schwab et al. (2020)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file icw_index-0.1.0.tar.gz.
File metadata
- Download URL: icw_index-0.1.0.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
689f8dc2fec38f6bfe5c6b528bb09e7ac12594c47ce1eafd52ccf539a9438144
|
|
| MD5 |
113db2c7bb8fc2f6b42127d43e91c5b4
|
|
| BLAKE2b-256 |
4a43b80b7069396a6c10346993a7dfad5771d435924aaacaafac11bfc88cec0d
|
Provenance
The following attestation bundles were made for icw_index-0.1.0.tar.gz:
Publisher:
publish.yml on josh-ashkinaze/icw-index
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
icw_index-0.1.0.tar.gz -
Subject digest:
689f8dc2fec38f6bfe5c6b528bb09e7ac12594c47ce1eafd52ccf539a9438144 - Sigstore transparency entry: 731938904
- Sigstore integration time:
-
Permalink:
josh-ashkinaze/icw-index@976041f08d7032b65b2d02ebfe845a4a55297c86 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/josh-ashkinaze
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@976041f08d7032b65b2d02ebfe845a4a55297c86 -
Trigger Event:
release
-
Statement type:
File details
Details for the file icw_index-0.1.0-py3-none-any.whl.
File metadata
- Download URL: icw_index-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a4ada519a1c835fa33aabc95f59f5d0108664cc1a0feb8ae90e36349664bacb
|
|
| MD5 |
98862376ed8c97a5b1860b9e5914a3c2
|
|
| BLAKE2b-256 |
f66997121a9b2aaf07dd7f4ec63240e2e3282f50161550bcee207546b64acce5
|
Provenance
The following attestation bundles were made for icw_index-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on josh-ashkinaze/icw-index
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
icw_index-0.1.0-py3-none-any.whl -
Subject digest:
5a4ada519a1c835fa33aabc95f59f5d0108664cc1a0feb8ae90e36349664bacb - Sigstore transparency entry: 731938905
- Sigstore integration time:
-
Permalink:
josh-ashkinaze/icw-index@976041f08d7032b65b2d02ebfe845a4a55297c86 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/josh-ashkinaze
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@976041f08d7032b65b2d02ebfe845a4a55297c86 -
Trigger Event:
release
-
Statement type: