Skip to main content

Scikit-learn-style geometric embeddings for event sequences.

Project description

Event2Vector (event2vec)

A Geometric Approach to Learning Composable Representations of Event Sequences

PyPI version License: MIT Python 3.6+ arXiv

Overview

Event2Vector is a framework for learning representations of discrete event sequences. Inspired by the geometric structures found in neural representations, this model uses a simple, additive recurrent structure to create composable and interpretable embeddings.

Key Concepts

  • Linear Additive Hypothesis: The core idea behind Event2Vector is that the representation of an event sequence can be modeled as the vector sum of the embeddings of its individual events. This allows for intuitive vector arithmetic, enabling the composition and decomposition of event trajectories.
  • Euclidean and Hyperbolic Models: Event2Vector is offered in two geometric variants:
    • Euclidean model: Uses standard vector addition, providing a straightforward, flat geometry for event trajectories.
    • Hyperbolic model: Employs Möbius addition, which is better suited for hierarchical data structures, as it can embed tree-like patterns with less distortion.
  • Estimator API: A scikit-learn style Event2Vec estimator exposes fit, fit_transform, and transform, enabling drop-in use inside pipelines while keeping the compositional recurrent loss from the paper.
  • Padded batching: Optional padding allows entire minibatches of variable-length sequences to be processed in parallel, significantly accelerating training on large corpora without changing model behavior.

For more details, check Sulc A., Event2Vector: A Geometric Approach to Learning Composable Representations of Event Sequences

Example Applications

Installation

Install the package directly from PyPI:

pip install event2vector

Or install from source:

git clone https://github.com/sulcantonin/event2vec_public.git
cd event2vec_public
pip install .

Estimator API

The Event2Vec class mirrors scikit-learn transformers so it can slot into existing NLP pipelines:

from event2vector import Event2Vec

model = Event2Vec(
    num_event_types=len(vocab),
    geometry="euclidean",
    embedding_dim=128,
    pad_sequences=True,
    num_epochs=50,
)
model.fit(train_sequences, verbose=True)
train_embeddings = model.transform(train_sequences) 

Hyperbolic variant (training + using trained weights):

from event2vector import Event2Vec, HyperbolicUtils
import torch

hyp_model = Event2Vec(
    num_event_types=len(vocab),
    geometry="hyperbolic",
    curvature=1.0,
    embedding_dim=128,
    pad_sequences=True,
    num_epochs=50,
)
hyp_model.fit(train_sequences, verbose=True)

# Use the trained weights: encode sequences and query the decoder
seq_embeddings = hyp_model.transform(test_sequences, as_numpy=False)
torch_model = hyp_model.model

# Hyperbolic addition + distance between two datapoints (Poincaré ball)
u = seq_embeddings[0]
v = seq_embeddings[1]
uv_added = HyperbolicUtils.mobius_add(u, v, hyp_model.curvature)
uv_dist = HyperbolicUtils.poincare_dist_sq(u, v, hyp_model.curvature).sqrt()

Key methods:

  • fit: optimizes embeddings with the additive loss from the paper.
  • fit_transform: convenience helper returning the encoded sequences after fitting.
  • transform: freezes weights and encodes arbitrary sequences, optionally returning PyTorch tensors for downstream models.
  • most_similar: gensim-style nearest-neighbor lookup over learned event embeddings using tokens or full sequences as queries.
  • pad_sequences=True: enables fully vectorized batches with masking for substantial throughput gains on large corpora.

Device control: set use_gpu=False to force CPU even if CUDA/MPS is present, or pass an explicit device (e.g., "cuda:0" or "cpu").

Brown Corpus POS tagging example

After installation, you can try to run Brown Part-of-Speech tagging example from the paper.

python3 -m experiments.prepare_brown_data.py
python3 -m experiments.train_brown_data.py
python3 -m experiments.visualize_brown_corpus.py

Minimal example script

The repository includes a runnable minimal example that trains a tiny model end-to-end and prints example outputs (loss, embeddings, and nearest tokens). Run it from the repo root:

python3 examples/minimal_example.py

To try a hyperbolic run, open examples/minimal_example.py and set geometry="hyperbolic" in the Event2Vec constructor, then rerun the script.

References

For citations please use following Bibtex.

@article{sulc2025event2vec,
  title={Event2Vec: A Geometric Approach to Learning Composable Representations of Event Sequences},
  author={Sulc, Antonin},
  journal={arXiv preprint arXiv:2509.12188},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

event2vector-0.1.2.1.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

event2vector-0.1.2.1-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file event2vector-0.1.2.1.tar.gz.

File metadata

  • Download URL: event2vector-0.1.2.1.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.18

File hashes

Hashes for event2vector-0.1.2.1.tar.gz
Algorithm Hash digest
SHA256 649f662bb61c683a8eb6f370aac14ea4b32d739c1aa71942107c9452e3ba0f8c
MD5 13a41b875ba5f721794c7996469231c1
BLAKE2b-256 e690a1d48c5c0762c4d0d44ca7701b9554e8d478f81c2df80e9cc2e4350e20aa

See more details on using hashes here.

File details

Details for the file event2vector-0.1.2.1-py3-none-any.whl.

File metadata

  • Download URL: event2vector-0.1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.18

File hashes

Hashes for event2vector-0.1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 98840e951c2141c7a99bae7995a68ca7d868d9239120f8c51ae7e0071825f40f
MD5 f27d30c66aec656924de75fe0ab4def4
BLAKE2b-256 d1ad99fef9b3d8c34218b8231066a591c9055bca9d140c3ac4d89750831a84a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page