Package for training and further usage of Dataset2Vec meta-model.
Project description
Dataset2Vec
Introduction
This package aims to implement the approach proposed in Dataset2Vec: Learning Dataset Meta-Features by Jomaa et al. This package makes the training Dataset2Vec dataset encoder much more approachable by providing an API that is compatible with pytorch-lightning
's trainer
API. The output logs including tensorboard and checkpoints are stored in lightning_logs
or in default_root_dir
from pytroch_lightning.Trainer
if specified.
Installation
To install the package run the following command (you need Python 3.9 or higher):
pip install -r requirements.txt
Usage
Here is a simple example of the usage of the package:
from pathlib import Path
from pytorch_lightning import Trainer
from dataset2vec import (
Dataset2Vec,
Dataset2VecLoader,
RepeatableDataset2VecLoader,
)
train_loader = Dataset2VecLoader(Path("data/train")) # Path with .csv files
val_loader = RepeatableDataset2VecLoader(
Path("data/val")
) # Path with .csv files
model = Dataset2Vec()
trainer = Trainer(
max_epochs=2, log_every_n_steps=1, default_root_dir="output_logs"
) # output of the training will be stored in output_logs
trainer.fit(model, train_loader, val_loader)
Development
Here are the snippets useful for the development of the package:
./scripts/check_code.sh
- runs code quality checking usingblack
,flake8
,isort
andmypy
.pytest
- runs all unit testscd docs && make html
- generates documentationpython -m build
- build the packagetwine upload dist/*
- uploads the package to PyPI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dataset2vec-1.0.0.tar.gz
.
File metadata
- Download URL: dataset2vec-1.0.0.tar.gz
- Upload date:
- Size: 19.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f1bce2cf0daeae73302b6a7058f7114c2af2d1319e897ff87c28109a5a00f9c |
|
MD5 | 1088648f3d293263c970c2aee85f89c1 |
|
BLAKE2b-256 | bd0a6beae3e5a8c0a1c56ad9852ea841c73d12b696869141cebc7046c3ce40ea |
File details
Details for the file dataset2vec-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: dataset2vec-1.0.0-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9cd4729b5a5de37277c550533b276e1c976161443b5647fbabfc80af8291a2e3 |
|
MD5 | 4bce304e981bd596b9b052ad1a042522 |
|
BLAKE2b-256 | 13a4dd630a06cf7f8af2c6e2a4925a43c97b525d707b3265c9e9a0838b7aab99 |