Skip to main content

Microsynthesis using quasirandom sampling and/or IPF

Project description

humanleague

License

PyPI - Python Version PyPI version CRAN_Status_Badge

DOI status

python (pip) build r-cmd-check

Codacy Badge codecov

Introduction

Please note ongoing development is for the python version only. R development is currently maintenance-only due to resource constraints.

humanleague is a python and an R package for microsynthesising populations from marginal and (optionally) seed data. The package is implemented in C++ for performance.

The package contains algorithms that use a number of different microsynthesis techniques:

The latter provides a bridge between deterministic reweighting and combinatorial optimisation, offering advantages of both techniques:

  • generates high-entropy integral populations
  • can be used to generate multiple populations for sensitivity analysis
  • goes some way to address the 'empty cells' issues that can occur in straight IPF
  • relatively fast computation time

The algorithms:

  • support arbitrary dimensionality for both the marginals and the seed.
  • produce statistical data to ascertain the likelihood/degeneracy of the population (where appropriate).

The package also contains the following utilities:

  • a Sobol sequence generator (implemented as a generator class in python)
  • a function to construct a closest integer population from a discrete univariate probability distribution.
  • an algorithm for sampling an integer population from a discrete multivariate probability distribution, constrained to the marginal sums in every dimension (see below).
  • utility functions to convert a population represented as a multidimensional state array into tables of either counts (indexed by state) or individuals.

Version 1.0.1 reflects the work described in the Quasirandom Integer Sampling (QIS) paper.

Installation

Python

Requires Python 3.12 or newer. The package can be installed using pip, e.g.

pip install humanleague

Development

uv is highly recommended for managing environments.

uv sync --dev
uv build
uv run pytest

Install the pre-commit hooks using uv run pre-commit install.

R

Official release:

> install.packages("humanleague")

For a development version

> devtools::install_github("virgesmith/humanleague")

Or, for the legacy version

> devtools::install_github("virgesmith/humanleague@1.0.1")

Documentation and Examples

R

Consult the package documentation, e.g.

> library(humanleague)
> ?humanleague

Python

The package now contains type annotations and your IDE should automatically display this, e.g.:

help

NB type stubs are generated using the pybind11-stubgen package, with some manual corrections.

Multidimensional integerisation

Building on the one-dimensionl integerise function - which given a discrete probability distribution and a count, returns the closest integer population to the distribution that sums to the count - a multidimensional equivalent integerise is introduced. In one dimension, for example this:

>>> import humanleague
>>> p = [0.1, 0.2, 0.3, 0.4]
>>> result, stats = humanleague.integerise(p, 11)
>>> result
array([1, 2, 3, 5], dtype=int32)
>>> stats
{'rmse': 0.3535533905932736}

produces the optimal (i.e. closest possible) integer population to the discrete distribution.

The integerise function generalises this problem and applies it to higher dimensions: given an n-dimensional array of real numbers where the 1-d marginal sums in every dimension are integral (and thus the total population is too), it attempts to find an integral array that also satisfies these constraints.

The QISI algorithm is repurposed to this end. As it is a sampling algorithm it cannot guarantee that a solution is found, and if so, whether the solution is optimal. If it fails this does not prove that a solution does not exist for the given input.

>>> import numpy as np
>>> a = np.array([[ 0.3,  1.2,  2. ,  1.5],
                  [ 0.6,  2.4,  4. ,  3. ],
                  [ 1.5,  6. , 10. ,  7.5],
                  [ 0.6,  2.4,  4. ,  3. ]])
# marginal sums
>>> a.sum(axis=0)
array([ 3., 12., 20., 15.])
>>> a.sum(axis=1)
array([ 5., 10., 25., 10.])
# perform integerisation
>>> result, stats = humanleague.integerise(a)
>>> stats
{'conv': True, 'rmse': 0.5766281297335398}
>>> result
array([[ 0,  2,  2,  1],
       [ 0,  3,  4,  3],
       [ 2,  6, 10,  7],
       [ 1,  1,  4,  4]])
# check marginals are preserved
>>> (result.sum(axis=0) == a.sum(axis=0)).all()
True
>>> (result.sum(axis=1) == a.sum(axis=1)).all()
True

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

humanleague-2.4.4.tar.gz (68.7 kB view details)

Uploaded Source

File details

Details for the file humanleague-2.4.4.tar.gz.

File metadata

  • Download URL: humanleague-2.4.4.tar.gz
  • Upload date:
  • Size: 68.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for humanleague-2.4.4.tar.gz
Algorithm Hash digest
SHA256 fe1e1a4258a240b385ef487de1662794b14e758120c85867320f9f755e084635
MD5 d3aca8d2e6c8048c1e201f4e106ed64d
BLAKE2b-256 0c028578619dc4e86864d6a190c5f99f83cc172bed16cd2442d12818b2259501

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page