Skip to main content

HIVAE (Handling incomplete heterogeneous data using VAEs (doi: 10.1016/j.patcog.2020.107501, 2020 and used in Lancet . 2021 (doi: 10.1016/S0140-6736(21)01638-X).Extension of implementations as easy to use Python library/tf2 version (incl. imputation objects for sklearn API) (a.karwath)

Project description

hivae2

This repository contains a modular reimplemenation of the "Heterogeneous Incomplete Variational Autoencoder model (HI-VAE) written by Alfredo Nazabal (anazabal@turing.ac.uk) et al. .written in Python.

The details of this model can be found are included in this paper. Please cite it if you use this code/library for your own research. This is an extenstion of implementations as easy to use Python library, upgraded for tensorflow2.

Examples

See examples directory for usage

Files description

*(outdated) HIVAE.py: The main script of the library, it needs to imported to work with the library and is connected to all the other scripts.

  • loglik_ models_ missing_normalize.py: In this file, the different likelihood models for the different types of variables considered (real, positive, count, categorical and ordinal) are included.
  • model_ HIVAE_inputDropout.py: Contains the HI-VAE with input dropout encoder model.
  • model_ HIVAE_factorized.py: Contains the HI-VAE with factorized encoder model

Contact

Comments

This version required tf2. For apple silcone users, please follow : https://developer.apple.com/metal/tensorflow-plugin/

Comments on general_example.py (might be outdated!)

main_directory: where is the project folder

dataset_name: the name of the database (if you want)

types_list_d: a dictionary where the key is the dataset name, which contains a list with tuples that indicates the column names, types, the number of dimensions and classes

types:

• count: real values

• cat: categorical 0 or 1

• pos: positive real values

• ordinal: ordinal number

number of dimensions:

• number of possibilities in the categorical variables or 1 in numerical

number of classes:

• number of options (same of number of dimensions for categorical variables)

dataset_path: this is the folder of the csv files

results_path: the output folder for results

network_path: where the models are going to be stored

types_list: the specific type for the dataset you are going to use data_file: the full dataset train_file/ test_file: if the dataset was already splitted

train_data/test_data: pandas dataframes

dim_y: the depth of the network

dim_s/dim_z: dimensions of the embedding

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

hivae2-0.13-py3-none-any.whl (31.1 kB view details)

Uploaded Python 3

File details

Details for the file hivae2-0.13-py3-none-any.whl.

File metadata

  • Download URL: hivae2-0.13-py3-none-any.whl
  • Upload date:
  • Size: 31.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for hivae2-0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 bfcba0ac61f997a81bd4c6790173032f9063682cda47d1f4cb656dad996e990d
MD5 94a667a7aef7fa75851099d5ea3991a5
BLAKE2b-256 3a44154c3fe8c85530764ce98e988e51c9c522d97bbf6884ea4f6884f33a8997

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page