Network Scale-Up Models for Aggregated Relational Data

These details have not been verified by PyPI

Project links

Project description

Fitting Network Scale-up Models

Overview

This package fits several different network scale-up models (NSUM) to Aggregated Relational Data (ARD). ARD represents survey responses about how many people each respondent knows in different subpopulations through "How many X's do you know?" questions. Specifically, if N_i respondents are asked how many people they know in N_k subpopulations, then ARD is an N_i by N_k matrix, where the (i,j) element represents how many people respondent reports knowing in subpopulation . NSUM leverages these responses to estimate the unknown size of hard-to-reach populations. See Laga, et al. (2021) for more details.

In this package, we provide functions to estimate the size and accompanying parameters (e.g. degrees) from 4 papers:

Killworth, P. D., Johnsen, E. C., McCarty, C., Shelley, G. A., and Bernard, H. R. (1998) plug-in MLE
Killworth, P. D., McCarty, C., Bernard, H. R., Shelley, G. A., and Johnsen, E. C. (1998) MLE
Zheng, T., Salganik, M. J., and Gelman, A. (2006) overdispersed model
Laga, I., Bao, L., and Niu, X (2021) uncorrelated, correlated, and covariate models

Requirements

This package requires the following Python libraries:

numpy >= 1.24
pandas >= 2.1
scipy >= 1.11
cmdstanpy >= 1.1

PIMLE

The plug-in MLE estimator from Killworth, P. D., Johnsen, E. C., McCarty, C., Shelley, G. A., and Bernard, H. R. (1998) is a two-stage estimator that first estimates the degrees for each respondent d_i by maximizing the following likelihood for each respondent:

$$L(d_i;y,{N_k}) = \prod_{k=1}^{L} {d_i \choose y_{ik}} \left(\frac{N_k}{N}\right)^{y_{ik}}\left(1-\frac{N_k}{N}\right)^{d_i-y_{ik}}$$

where is the number of subpopulations with known N_k . For the second stage, the model plugs in the estimated d_i into the equation

$$\frac{y_{ik}}{d_i} = \frac{N_k}{N}$$

and solves for the unknown N_k for each respondent. These values are then averaged to obtain a single estimated of N_k .

To summarize, stage 1 estimate $\hat{d}_i$ by

$\hat{d}_i formula$

and then these estimates are used in stage 2 to estimate the unknown $\hat{N}_k$ by

$\hat{N}_{k}^{PIMLE} formula$

The following demonstrates how to use the killworth function to compute PIMLE estimates of unknown subpopulation sizes:

pimle_est = killworth(ard,
                      known_sizes = sizes[[0, 1, 3]],
                      known_ind = [0,1,3],
                      N = N,
                      model = "PIMLE")

Note that the function may provide a warning saying that at least one $\hat{d}_i$ was 0. This occurs when a respondent does not report knowing anyone in the known subpopulations. This is an issue for the PIMLE since a 0 value is in the denominator for $\hat{N}_u^{PIMLE}$ . Thus, we ignore the responses from respondents that correspond to $\hat{d}_i$ = 0.

MLE

The MLE estimator from Killworth, P. D., McCarty, C., Bernard, H. R., Shelley, G. A., and Johnsen, E. C. (1998) is also a two-stage model with an identical first stage, i.e

$\hat{d}_i formula$

However, the second stage estimates $\hat{N}_k$ by maximizing the Binomial likelihood with respect to $\hat{N}_k$ , fixing d_i at the estimated $\hat{d}_i$ . Thus, the estimate for the unknown subpopulation size is given by

$\hat{N}_k^{MLE} formula$

The following demonstrates how to use the killworth function to compute MLE estimates of unknown subpopulation sizes:

mle_est = killworth(ard,
                      known_sizes = np.ravel(sizes)[[0,1,3]],
                      known_ind = [0,1,3],
                      N = N,
                      model = "MLE")

Note that there is no warning here since the denominator depends on the summation of $\hat{d}_i$ .

Bayesian Models

Now we introduce the two Bayesian estimators implemented in this package.

Overdispersed Model

The overdispersed model proposed in Zheng et al. (2006) assumes the following likelihood:

$$y_{ik} \sim \text{Negative-Binomial}(\text{mean}=e^{a_i+b_k},\text{overdispersion}=\omega_k)$$

Please see the original manuscript for more details on the model structure and priors.

This package fits this overdispersed model either via the Gibbs-Metropolis algorithm provided in the original manuscript (overdispersed) or via Stan (overdispersedStan). We suggest using the Stan version since convergence and effective sample sizes are more satisfactory in the Stan implementation, and does not require tuning jumping scales for Metropolis updates.

In order to identity the $\alpha_i$ and $\beta_k$ as log-degrees and log-prevalences, respectively, the overdispersed model requires scaling the parameters. In order to scale the parameters, the user must supply at least one subpopulation with known size and the column index corresponding to that known size. Additionally, a two secondary groups may be supplied which can adjust for differences in gender or other binary group classifications. More details of the scaling procedure can be found in the original manuscript.

The following demonstrates how to use the overdispersed and overdispersedStan functions to compute estimates of unknown subpopulation sizes using the Gibbs-Metropolis and Stan implementations of the overdispersed model. Note that in practice, both warmup and iter should be set to higher values:

overdisp_gibbs_metrop_est = overdispersed(
                         ard, 
                         known_sizes = sizes[[0,1,3]], 
                         known_ind = [0,1,3], 
                         G1_ind = 0,
                         G2_ind = 1,
                         B2_ind = 3,
                         N=N,
                         warmup = 500,
                         iter = 1000,
                         verbose = True,
                         init = "MLE")

overdisp_stan = overdispersedStan(
                         ard, 
                         known_sizes = sizes[[0,1,3]], 
                         known_ind = [0,1,3],
                         G1_ind = 0,
                         G2_ind = 1,
                         B2_ind = 3,
                         N = N,
                         chains = 2,
                         cores = 2,
                         warmup = 250,
                         iter = 500)

Correlated Models

The correlated model proposed in Laga et al. (2023) assumes the following likelihood

where critically,

$$\textbf{b}_i \sim \mathcal{N}_k(\mu, \Sigma)$$

so that the responses for each respondent are correlated across subpopulations. Again, $\delta_i$ and $\rho_k$ need to be scaled, and they can either be scaled using the same procedure as for the overdispersed model (providing indices corresponding to different groups), by using all known subpopulation sizes, or by weighting groups according to their correlation with other groups. More details about these scaling procedures are provided in Laga et al. (2023).

In this package, model parameters are estimated via Stan. Note that while the full model likelihood depends on $X, Z_{global}$ , and $Z_{subpop}$ , any combination of these covariates can be provided. Additionally, we can assume that $\Sigma$ is a diagonal matrix (i.e. no correlation) by setting the argument model = uncorrelated in the correlatedStan function.

The following demonstrates how to use the correlatedStan function to compute estimates of unknown subpopulation sizes using the Stan implementations of the correlated model. Note that in practice, both warmup and iter should be set to higher values:

correlated_cov_stan = correlatedStan(
    ard,
    known_sizes = sizes[[0,1,3]],
    known_ind = [0,1,3],
    model = "correlated",
    scaling = "weighted",
    x = x,
    z_subpop = z_subpop,
    z_global = z_global,
    N = N,
    chains = 2,
    cores = 2,
    warmup = 250,
    iter = 500,
)

correlated_nocov_stan = correlatedStan(
    ard,
    known_sizes = sizes[[0,1,3]],
    known_ind = [0,1,3],
    model = "correlated",
    scaling = "all",
    N = N,
    chains = 2,
    cores = 2,
    warmup = 250,
    iter = 500,
)

uncorrelated_cov_stan = correlatedStan(
    ard,
    known_sizes = sizes[[0,1,3]],
    known_ind = [0,1,3],
    model = "uncorrelated",
    scaling = "all",
    x = x,
    z_subpop = z_subpop,
    z_global = z_global,
    N = N,
    chains = 2,
    cores = 2,
    warmup = 250,
    iter = 500,
)

uncorrelated_x_stan = correlatedStan(
    ard,
    known_sizes = sizes[[0,1,3]],
    known_ind = [0,1,3],
    model = "uncorrelated",
    scaling = "all",
    x = x,
    N = N,
    chains = 2,
    cores = 2,
    warmup = 250,
    iter = 500,
)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.11

Oct 16, 2025

0.0.10

Sep 24, 2025

0.0.9 yanked

Sep 24, 2025

Reason this release was yanked:

Incorrect build — package files missing

0.0.8

Apr 19, 2025

0.0.7

Apr 19, 2025

0.0.6

Apr 19, 2025

0.0.5

Apr 19, 2025

0.0.4

Apr 19, 2025

0.0.3

Apr 19, 2025

0.0.2

Apr 19, 2025

0.0.1

Apr 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

networkscaleup-0.0.11.tar.gz (3.0 MB view details)

Uploaded Oct 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

networkscaleup-0.0.11-py3-none-any.whl (3.0 MB view details)

Uploaded Oct 16, 2025 Python 3

File details

Details for the file networkscaleup-0.0.11.tar.gz.

File metadata

Download URL: networkscaleup-0.0.11.tar.gz
Upload date: Oct 16, 2025
Size: 3.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for networkscaleup-0.0.11.tar.gz
Algorithm	Hash digest
SHA256	`56623c134cc5e4199c730a07f3fc2f09043f8d3f5041b052686734f30b593ae6`
MD5	`e36c364a23c26cfdcef404c4b1b41087`
BLAKE2b-256	`536d676fed1b7b090376e47266618c7f15eee5dcbddb28bcb282ce1f27476330`

See more details on using hashes here.

File details

Details for the file networkscaleup-0.0.11-py3-none-any.whl.

File metadata

Download URL: networkscaleup-0.0.11-py3-none-any.whl
Upload date: Oct 16, 2025
Size: 3.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for networkscaleup-0.0.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cd725957a88ce576de5b03a038a9093f43fbddfa990ac0e54753f66a297c43ef`
MD5	`2b879510d3ffa569b0879de3296d62ac`
BLAKE2b-256	`66893d54c7db312f27263d662847c4898aabf9fc2ddab1e981d75eb33f2210bc`

See more details on using hashes here.

networkscaleup 0.0.11

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Fitting Network Scale-up Models

Overview

Requirements

PIMLE

MLE

Bayesian Models

Overdispersed Model

Correlated Models

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes