Lifestream data analysis with PyTorch
Project description
pytorch-lifestream or ptls a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-size volumes of raw events like game history events, clickstream data, purchase history or card transactions.
It supports various methods of self-supervised training, adapted for event sequences:
- Contrastive Learning for Event Sequences (CoLES)
- Contrastive Predictive Coding (CPC)
- Replaced Token Detection (RTD) from ELECTRA
- Next Sequence Prediction (NSP) from BERT
- Sequences Order Prediction (SOP) from ALBERT
- Masked Language Model (MLM) from ROBERTA
It supports several types of encoders, including Transformer and RNN. It also supports many types of self-supervised losses.
The following variants of the contrastive losses are supported:
- Contrastive loss (paper)
- Triplet loss (paper)
- Binomial deviance loss (paper)
- Histogramm loss (paper)
- Margin loss (paper)
- VICReg loss (paper)
Install from PyPi
pip install pytorch-lifestream
Install from source
# Ubuntu 20.04
sudo apt install python3.8 python3-venv
pip3 install pipenv
pipenv sync --dev # install packages exactly as specified in Pipfile.lock
pipenv shell
pytest
Demo notebooks
We have a demo notebooks here, some of them:
- Supervised model training notebook
- Self-supervided training and embeddings for downstream task notebook
- Self-supervided embeddings in CatBoost notebook
- Self-supervided training and fine-tuning notebook
- Self-supervised TrxEncoder only training with Masked Language Model task and fine-tuning notebook
- Pandas data preprocessing options notebook
- PySpark and Parquet for data preprocessing notebook
- Fast inference on large dataset notebook
- Supervised multilabel classification notebook
- CoLES multimodal notebook
And we have a tutorials here
Docs
Library description index
Experiments on public datasets
pytorch-lifestream usage experiments on several public event datasets are available in the separate repo
PyTorch-LifeStream in ML Competitions
- Data Fusion Contest 2022 report (in Russian)
- Data Fusion Contest 2022 report, Sber AI Lab team (in Russian)
- VK.com Graph ML Hackaton report (in Russian)
- VK.com Graph ML Hackaton report, AlfaBank team (in Russian)
- American Express - Default Prediction Kaggle contest report (in Russian)
- Data Fusion Contest 2024, Sber AI Lab team
- Data Fusion Contest 2024, Ivan Alexandrov
- American Express - Default Prediction
- COTIC -
pytorch-lifestreamis used in experiment for Continuous-time convolutions model of event sequences
How to contribute
- Make your chages via Fork and Pull request.
- Write unit test for new code in
ptls_tests. - Check unit test via
pytest: Example.
Citation
We have a paper you can cite it:
@inproceedings{sakhno2025pytorch,
title={PyTorch-Lifestream: Learning Embeddings on Discrete Event Sequences},
author={Sakhno, Artem and Kireev, Ivan and Babaev, Dmitrii and Savchenko, Maxim and Gusev, Gleb and Savchenko, Andrey},
booktitle={Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence},
pages={11104--11108},
year={2025}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pytorch_lifestream-0.7.0.tar.gz.
File metadata
- Download URL: pytorch_lifestream-0.7.0.tar.gz
- Upload date:
- Size: 190.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdc88eea69a7db96a1a8df5055e2da0bbe2cc05944ac95fd48eb600a9002e3fc
|
|
| MD5 |
7a5ce7aba176891a88563f2ac13c309a
|
|
| BLAKE2b-256 |
b0274bf8c7cbe567223599d442a984935c2422369d57e52c30a9d5204471af11
|