Package for training and further usage of Dataset2Vec meta-model.
Project description
Dataset2Vec
Introduction
This package aims to implement the approach proposed in Dataset2Vec: Learning Dataset Meta-Features by Jomaa et al. This package makes the training Dataset2Vec dataset encoder much more approachable by providing an API that is compatible with pytorch-lightning's trainer API. The output logs including tensorboard and checkpoints are stored in lightning_logs or in default_root_dir from pytroch_lightning.Trainer if specified.
Installation
To install the package run the following command (you need Python 3.9 or higher):
pip install -r requirements.txt
Usage
Here is a simple example of the usage of the package:
from pathlib import Path
from pytorch_lightning import Trainer
from dataset2vec import (
Dataset2Vec,
Dataset2VecLoader,
RepeatableDataset2VecLoader,
)
train_loader = Dataset2VecLoader(Path("data/train")) # Path with .csv files
val_loader = RepeatableDataset2VecLoader(
Path("data/val")
) # Path with .csv files
model = Dataset2Vec()
trainer = Trainer(
max_epochs=2, log_every_n_steps=1, default_root_dir="output_logs"
) # output of the training will be stored in output_logs
trainer.fit(model, train_loader, val_loader)
Development
Here are the snippets useful for the development of the package:
./scripts/check_code.sh- runs code quality checking usingblack,flake8,isortandmypy.pytest- runs all unit testscd docs && make html- generates documentationpython -m build- build the packagetwine upload dist/*- uploads the package to PyPI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataset2vec-1.0.0.tar.gz.
File metadata
- Download URL: dataset2vec-1.0.0.tar.gz
- Upload date:
- Size: 19.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f1bce2cf0daeae73302b6a7058f7114c2af2d1319e897ff87c28109a5a00f9c
|
|
| MD5 |
1088648f3d293263c970c2aee85f89c1
|
|
| BLAKE2b-256 |
bd0a6beae3e5a8c0a1c56ad9852ea841c73d12b696869141cebc7046c3ce40ea
|
File details
Details for the file dataset2vec-1.0.0-py3-none-any.whl.
File metadata
- Download URL: dataset2vec-1.0.0-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9cd4729b5a5de37277c550533b276e1c976161443b5647fbabfc80af8291a2e3
|
|
| MD5 |
4bce304e981bd596b9b052ad1a042522
|
|
| BLAKE2b-256 |
13a4dd630a06cf7f8af2c6e2a4925a43c97b525d707b3265c9e9a0838b7aab99
|