Skip to main content

Adversarial estimator for structural models on graphs

Project description

Adversarial estimation on graphs

Adversarial estimator for graph structural models extends theoretical framework proposed by Kaji et al. (2023) to structural models defined on graphs (strategic communication games, peer effect models etc.). With graph data, we face unique challenges, in tabular datasets arbitrary row is usually regarded a single realization from joint distribution of exogenous and outcome variables, with graph data the graph itself is essentially a single random realization hence we need to create variability that would allow us to train the discriminative classifier, our current approach is to sample subgraphs from the ground truth and synthetic dataset and label them according to their origin to create necessary variability required for adversarial strategy, intuitive justification of this strategy would be something akin of ergodic theorem for signal transmission on networks, i.e. if individual generates a signal transmitting to his peers, given sufficient number of steps from the point of origin the effects will eventually dissipate. Currently implemented experiments thus rely on $k$-hop ego sampling from the ground truth and synthetic data. Further challenges are posed, by multiple equilibria of dynamic network models, lack of closed form asymptotics and the necessity to optimize discriminator architecture to suit different classes of structural models.

Below I briefly formalize the estimation problem:

For ground truth dataset graph dataset $G = (X,Y,N,A)$ where:

  • X is a matrix of $n \times k$ exogenous characteristics of individual nodes, i.e. each node is associated with $k$ dimensional vector of features
  • Y is a matrix of $n \times l$ endogenous outcomes of individual nodes, i.e. each node is associated with $l$ dimensional vector of outcomes
  • N = {0,...,n} is set of node indices
  • A $n\times n$ is an adjacency matrix, symmetric and $A \in {0,1} ^{n\times n}$

Structural model $m_{\theta}: R^{n \times k } \to R^{n \times l }$, $m$ is parametrized by unknown vector $\theta$.

Synthetic dataset $G(\theta)' = (X,Y',N,A)$ where $Y'=m_{\theta}(X,A, \theta)$

GNN discriminator $D: g_i \to [0,1]$, $g_i$ is a subgraph sampled from $G$ or $G'$. The discriminator is essentially a binary classifier which predicts if given sampled example belongs to ground truth or synthetic data.

We search for $\theta^*$ such that:

  \theta^* \in \arg \min_{\theta}\max_{D} L_D(G'(\theta),G)

where the loss $L_D$ is some classification quality metric we want to minimize induced by the optimal classifier $D^*$, evaluated on the test set (e.g. accuracy or negative cross-entropy).

Reference:

Kaji, T., Manresa, E., & Pouliot, G. (2023). An adversarial approach to structural estimation. Econometrica, 91(6), 2041-2063.

Practical implementation

To build and train GNN discriminator I use components from PyTorch Geometric module for deep learning on graphs. Generator is implemented with base class unified interface for sampling, both, ground truth and synthetic data are handled by the generator. Generator for ground truth data is essentially a sampling manager, while generator for synthetic data requires in addition mapping function defining the structural model, and instance of ground truth generator to ensure that exogenous characteristic of synthetic data are an exact copy of those in ground truth dataset. The synthetic generator also implements a generate_outcomes method to produce counterfactuals outcome values based on supplied structural parameters. Util functions are mostly for encapsulating the discriminator training and testing into the outside minimization objective. Default minimization method is Bayesian optimization with surrogate models, since it attacks complex black-box objectives without the need for analytical derivative and combines benefits of global search with benefits of local refinement.

linear_in_means_model.ipynb

Is a test notebook showcasing the estimation on 2-parameter case where objective and optimization progress can be visualized.

Notes

  • As of now utils are specific to the discriminator use in linear in means experiment, but should be generalized.
  • Architecture of GNN for the experiment is chosen ad hoc since the identification is strong.
  • Linear experiment uses accuracy as a minimization objective, for more complex models more sensitive metrics are necessary.a

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adversarial_nets_lib_econ-0.1.1.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adversarial_nets_lib_econ-0.1.1-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file adversarial_nets_lib_econ-0.1.1.tar.gz.

File metadata

File hashes

Hashes for adversarial_nets_lib_econ-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a9544f158bc60448f4b3b7afefd61ae8c93144347fe983cd7766ee9fa51744db
MD5 43783ac0c0776b160e791f83cfdd3596
BLAKE2b-256 de501768d16dc4cf821f35d8e72d8fbb8ba69cdc56f72053acc28d69647e667c

See more details on using hashes here.

File details

Details for the file adversarial_nets_lib_econ-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for adversarial_nets_lib_econ-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 293a59b7fcfb030b5c2a030b9558d3b3bb0c58e43fea646ee03259d68cff2c02
MD5 56fb4b35284a7139d2bb8439a584917f
BLAKE2b-256 a6b5ddf0a294f64d501d6abff78fd4729e76955e8404c84ddc62024c85f286b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page