A dataset utils repository based on tf.data. For tensorflow 2.x only!
Project description
datasets
A dataset utils repository based on tf.data
. For tensorflow>=2.0.0b only!
Requirements
- python 3.6
- tensorflow>=2.0.0b
Installation
pip install nlp-datasets
Contents
- Build dataset for seq2seq models. seq2seq_dataset.py
- Build dataset for NMT. nmt_dataset.py
- Build dataset for DSSM. dssm_dataset.py
- Build dataset for MatchPyramid. matchpyramid_dataset.py
Usage
For NMT task
from nlp_datasets import NMTSameFileDataset
o = NMTSameFileDataset(config=None, logger_name=None)
train_files = [] # your files
# train_dataset is an instance of tf.data.Dataset
train_dataset = o.build_train_dataset(train_files)
from nlp_datasets import NMTSeparateFileDataset
o = NMTSeparateFileDataset(config=None, logger_name=None)
feature_files = [] # your files
label_files = []
train_dataset = o.build_train_dataset(feature_files,label_files)
For DSSM task
from nlp_datasets import DSSMSameFileDataset
o = DSSMSameFileDataset(config=None, logger_name=None)
train_dataset = o.build_train_dataset(train_files=[])
from nlp_datasets import DSSMSeparateFileDataset
o = DSSMSeparateFileDataset(config=None, logger_name=None)
query_files = []
doc_files = []
label_files = []
train_dataset = o.build_train_dataset(query_files, doc_files, label_files)
For MatchPyramid task
from nlp_datasets import MatchPyramidSameFileDataset
o = MatchPyramidSameFileDataset(config=None, logger_name=None)
train_dataset = o.build_train_dataset(train_files=[])
from nlp_datasets import MatchPyramidSeparateFilesDataset
o = MatchPyramidSeparateFilesDataset(config=None, logger_name=None)
query_files = []
doc_files = []
label_files = []
train_dataset = o.build_train_dataset(query_files, doc_files, label_files)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlp_datasets-0.0.8.tar.gz
(12.0 kB
view hashes)
Built Distribution
Close
Hashes for nlp_datasets-0.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45017d4f0ef4da1305a0da1c00f2cc56d9aff40e1a1caa8aed29e981d8f7ce7f |
|
MD5 | c3850635b0af878c58faa25eab1b418a |
|
BLAKE2b-256 | 68f685a3de9748f214eeb1379b627065d7eb5a8e74975f012be82a7058cbdb73 |