Skip to main content

Package for training and further usage of Dataset2Vec meta-model.

Project description

Dataset2Vec

Introduction

This package aims to implement the approach proposed in Dataset2Vec: Learning Dataset Meta-Features by Jomaa et al. This package makes the training Dataset2Vec dataset encoder much more approachable by providing an API that is compatible with pytorch-lightning's trainer API. The output logs including tensorboard and checkpoints are stored in lightning_logs or in default_root_dir from pytroch_lightning.Trainer if specified.

Installation

To install the package run the following command (you need Python 3.9 or higher):

pip install -r requirements.txt

Usage

Here is a simple example of the usage of the package:

from pathlib import Path

from pytorch_lightning import Trainer

from dataset2vec import (
    Dataset2Vec,
    Dataset2VecLoader,
    RepeatableDataset2VecLoader,
)

train_loader = Dataset2VecLoader(Path("data/train"))  # Path with .csv files
val_loader = RepeatableDataset2VecLoader(
    Path("data/val")
)  # Path with .csv files

model = Dataset2Vec()

trainer = Trainer(
    max_epochs=2, log_every_n_steps=1, default_root_dir="output_logs"
)  # output of the training will be stored in output_logs

trainer.fit(model, train_loader, val_loader)

Development

Here are the snippets useful for the development of the package:

  • ./scripts/check_code.sh - runs code quality checking using black, flake8, isort and mypy.
  • pytest - runs all unit tests
  • cd docs && make html - generates documentation
  • python -m build - build the package
  • twine upload dist/* - uploads the package to PyPI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataset2vec-1.0.0.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

dataset2vec-1.0.0-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file dataset2vec-1.0.0.tar.gz.

File metadata

  • Download URL: dataset2vec-1.0.0.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for dataset2vec-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6f1bce2cf0daeae73302b6a7058f7114c2af2d1319e897ff87c28109a5a00f9c
MD5 1088648f3d293263c970c2aee85f89c1
BLAKE2b-256 bd0a6beae3e5a8c0a1c56ad9852ea841c73d12b696869141cebc7046c3ce40ea

See more details on using hashes here.

File details

Details for the file dataset2vec-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: dataset2vec-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for dataset2vec-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9cd4729b5a5de37277c550533b276e1c976161443b5647fbabfc80af8291a2e3
MD5 4bce304e981bd596b9b052ad1a042522
BLAKE2b-256 13a4dd630a06cf7f8af2c6e2a4925a43c97b525d707b3265c9e9a0838b7aab99

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page