POME: Partially observed mixed-type data embeddings
Project description
POME: Learning partially observed mixed-type data embeddings
POME is a graph-based representation-learning method for heterogeneous datasets that incorporates missingness structures into the computation of low-dimensional sample and variable embeddings. It is applicable to any tabular datasets consisting of both numeric- and categorical-type features, where missing data patterns are supposed to be taken into account.
Installation
POME is implemented as a Python package and is easily installable from this repository by running
pip install -e .
Input format
POME expects input data to be given in the form of a pandas dataframe object, with rows representing variables/features and columns representing samples. Missing data needs to be encoded by a unique numerical value. Furthermore, POME expects one column storing datatypes of the respective variables. An example dataset could have the following structure, with e.g. value -99 encoding missing data:
| Sample1 | Sample2 | Sample3 | Type | |
|---|---|---|---|---|
| VariableA | 0 | 1 | -99 | cat |
| VariableB | 3.14 | -0.1 | 2.5 | numerical |
| VariableC | 0.3 | 1.2 | -99 | numerical |
| VariableD | 1 | 0 | 2 | cat |
Minimal working example
POME's core functionality is integrated into its Embedder class, which handles input transformation, training and output generation. A typical such workflow looks as follows:
import pandas as pd
from pome import Embedder
if __name__ == "__main__":
# Load data and set parameters.
example_df = pd.read_csv("example.csv", index_col=0)
NA_ENCODING = -99.0
DIMENSION = 16
DEVICE = "cpu"
# Initialize embedding object with parameters.
embedder = Embedder(epochs=100,
na_encoding=NA_ENCODING,
embedding_dimension=DIMENSION,
device=DEVICE,
enable_imputation=True)
# Fit embedding object to dataset.
embedder.fit(example_df)
# Output stores low-dimensional embeddings for samples and variables.
sample_embeddings, variable_embeddings, _ , _ = embedder.get_embeddings()
print("Computed sample embeddings: \n", sample_embeddings)
imputed_df = embedder.impute_all(na_value=NA_ENCODING)
print("Imputed data: \n", imputed_df)
Parameters
POME's Embedder class allows for the specification of the following parameters:
embedding_dimension : int = 32: Specifies the number of dimensions of the sample & variable embeddings learned by POME.epochs : int = 500: Sets the number of epochs that POME is supposed to be trained.device : str = "cpu": Specifies whether to train on CPU ("cpu") or GPU ("cuda").na_encoding : float = -99.0: The float encoding value of missing data.enable_imputation : bool = False: Set this to true if you want to use POME for imputation after training.
Functions
After initializing the Embedder object, the main three functions for using POME are:
fit(self, X, y=None): Training POME on the given input dataframe, with the input format as specified above.get_embeddings(self, format='pandas'): Return computed embeddings of samples and variables in dataframe format. Output is a four-tuple with sample embeddings in at position 0, and variable embeddings at position 1.impute_all(self, na_value : float): Imputes all missing values specified byna_valuein the input dataset, and directly returns the imputed dataframe.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pome_py-1.0.tar.gz.
File metadata
- Download URL: pome_py-1.0.tar.gz
- Upload date:
- Size: 26.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b2d8648c69df85aadc649ba155a976a6a8fe62337934ea74af439f4d1cbadc4
|
|
| MD5 |
3909cf322dd6987ee35dfddb0b6b13e6
|
|
| BLAKE2b-256 |
70ee50d1f722b0ec7d34a4d66e2a8ee540a9ea6791e6d1d810ebb24e0f27c3c5
|
File details
Details for the file pome_py-1.0-py3-none-any.whl.
File metadata
- Download URL: pome_py-1.0-py3-none-any.whl
- Upload date:
- Size: 24.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
face6f853f7eb58d55ad46f3e12c61974ef3f9348ed53059cb8a5bd107fe5937
|
|
| MD5 |
591e507d5901dc62375f0ca30241ac10
|
|
| BLAKE2b-256 |
d9964a0b90d84f0e78201a06bcfc4a91fd7152dde821a1c41e8b7cc846d2c09d
|