A unified latent variable modeling framework for analyzing large multimodal and multilingual datasets

These details have not been verified by PyPI

Project links

Project description

DeepLatent

DeepLatent is a unified latent variable modeling framework for analyzing large multimodal and multilingual datasets. It relies on variational inference using deep neural networks for estimation.

The package currently supports:

Generic latent factor models
Topic models: The latent variables are a mixture of topics within documents.
Ideal point models: The latent variables are interpreted as ideological dimensions.

🌟 Key Features

Multilingual and multimodal support
- Learn topics / ideal points across multiple modalities (e.g., texts and images, texts and votes, etc.)
- Learn the weight of each modality in determining the latent variables per observation
Flexible metadata handling:
- prevalence: covariates that influence the latent variables
- content: covariates that influence the response variables conditional on the latent variables (e.g., topic-word distributions)
- labels: outcomes for classification or regression tasks
- prediction: additional predictors for the labels
Flexible input/output representations:
- Document embeddings (for texts, images, audio-visual data)
- Word frequencies (BoW)
- Raw images
- Discrete choice data
- Voting records

📦 Models

`GTM` (Generalized Topic Model)

Learns topics on the simplex
Supports dirichlet or logistic_normal priors (optionally conditioned on covariates)

`IdealPointNN`

Learns unconstrained latent variables (ℝ️ⁿ) for ideal point modeling
Designed for political texts, images, audio and video recordings, surveys, and votes
Uses a gaussian prior (optionally conditioned on covariates)

Installation

From PyPI (Recommended)

pip install deeplatent

From Source

git clone https://github.com/PinchOfData/DeepLatent.git  
cd deeplatent
pip install -e .

Development Installation

git clone https://github.com/PinchOfData/DeepLatent.git 
cd deeplatent
python setup_dev.py

🚀 Getting Started

1. Prepare Your Data with `Corpus()`

Supports text, embeddings, votes, and survey questions:

import sys
sys.path.append('../src/')

from corpus import Corpus

modalities = {
    "text": {
        "column": "doc_clean",
        "views": {
            "bow": {
                "type": "bow",
                "vectorizer": CountVectorizer()
            }
        }
    },
    "image": {
        "column": "image_path",
        "views": {
            "embedding": {
                "type": "embedding",
                "embed_fn": my_image_embedder
            }
        }
    }
}

my_dataset = Corpus(df, modalities=modalities)

Optionally include metadata:

prevalence, content, labels, prediction

2. Train a Model

For Topic Models:

from models import GTM

model = GTM(
    n_topics=20, 
    doc_topic_prior="logistic_normal",
    ae_type="wae"
)

For Ideal Point Models:

from models import IdealPointNN

model = IdealPointNN(
    n_ideal_points=1, # one-dimensional ideal point model
    ae_type="vae"
)

🔧 Common Options

Argument	Description
`ae_type`	`"wae"` (Wasserstein autoencoder) or `"vae"` (variational autoencoder) or `"ae"` (plain autoencoder)
`fusion`	`"poe"` (Product of Experts), `"moe_gating"` (Mixture of Experts), or `"moe_average"` (Simple averaging across modalities)
`update_prior`	Learn a structured prior conditioned on `prevalence` covariates
`w_prior`	Strength of prior alignment for `wae`
`w_pred_loss`	Weight of supervised loss predicting `label`
`kl_annealing_*`	Strength of prior alignment for `vae`. Helps preventing posterior collapse.

🔍 Analysis and Utilities

📚 Topic Models (`GTM`)

get_topic_words() – top words per topic
get_covariate_words() – word shifts by content covariates
get_top_docs() – representative documents
get_topic_word_distribution() – topic-word matrix
get_covariate_word_distribution() – word shift matrix
plot_topic_word_distribution() – word clouds / bar plots
visualize_docs() – document embeddings (UMAP, t-SNE, PCA)
visualize_words() – word embeddings
visualize_topics() – topic embeddings

👤 Ideal Point Models (`IdealPointNN`)

get_ideal_points() – ℝ️ⁿ latent space
get_predictions() – supervised output
get_modality_weights() – fusion weights (PoE or gating)

📁 Tutorials

Check out the example notebooks to get started.

Download sample data to run some notebooks: Congressional Speeches CSV

📖 References

Deep Latent Variable Models for Unstructured Data , Germain Gauthier, Philine Widmer, Elliott Ash (2025)
The Neural Ideal Point Model , Germain Gauthier, Hugo Subtil, Philine Widmer (2025)

⚠️ Disclaimer

This package is under active development 🚧 — feedback and contributions welcome!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.3

Jan 23, 2026

0.1.2

Nov 27, 2025

0.1.1

Nov 20, 2025

This version

0.1.0

Oct 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deeplatent-0.1.0.tar.gz (2.6 MB view details)

Uploaded Oct 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

deeplatent-0.1.0-py3-none-any.whl (45.8 kB view details)

Uploaded Oct 30, 2025 Python 3

File details

Details for the file deeplatent-0.1.0.tar.gz.

File metadata

Download URL: deeplatent-0.1.0.tar.gz
Upload date: Oct 30, 2025
Size: 2.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for deeplatent-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`107aeb87ce5d210cbee51391112af512d914a597f203732127c45c889bb113a6`
MD5	`1e1ed6c4ea2b0abb24ec1e1f084af9cb`
BLAKE2b-256	`163c09cdbec71ff9f58f95f144252c8ba2c71fa09ebca3d4de8bda78127d21b5`

See more details on using hashes here.

File details

Details for the file deeplatent-0.1.0-py3-none-any.whl.

File metadata

Download URL: deeplatent-0.1.0-py3-none-any.whl
Upload date: Oct 30, 2025
Size: 45.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for deeplatent-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7ac30041947d63646c8e0f0e2250aa418cea36e178d99540351b854f76d772ff`
MD5	`cd03ddad94ad36f337196bb8aed77df7`
BLAKE2b-256	`eaff130ec424ca390b6b5c5bb361376bd4ee14b1e96b336a75e00e1b90c4bb14`

See more details on using hashes here.

deeplatent 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DeepLatent

🌟 Key Features

📦 Models

GTM (Generalized Topic Model)

IdealPointNN

Installation

From PyPI (Recommended)

From Source

Development Installation

🚀 Getting Started

1. Prepare Your Data with Corpus()

2. Train a Model

For Topic Models:

For Ideal Point Models:

🔧 Common Options

🔍 Analysis and Utilities

📚 Topic Models (GTM)

👤 Ideal Point Models (IdealPointNN)

📁 Tutorials

📖 References

⚠️ Disclaimer

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`GTM` (Generalized Topic Model)

`IdealPointNN`

1. Prepare Your Data with `Corpus()`

📚 Topic Models (`GTM`)

👤 Ideal Point Models (`IdealPointNN`)