A dataset utils repository. For tensorflow 2.x only!
Project description
datasets
A dataset utils repository. For tensorflow>=2.0.0b only!
Requirements
- python 3.6
- tensorflow>=2.0.0b
Installation
pip install nlp-datasets
Contents
- Build dataset for seq2seq models. seq2seq_dataset.py
- Build dataset for NMT. nmt_dataset.py
- Build dataset for DSSM. dssm_dataset.py
- Build dataset for MatchPyramid. matchpyramid_dataset.py
Usage
For NMT task
from nlp_datasets import NMTSameFileDataset
o = NMTSameFileDataset(config=None, logger_name=None)
train_files = [] # your files
# train_dataset is an instance of tf.data.Dataset
train_dataset = o.build_train_dataset(train_files)
from nlp_datasets import NMTSeparateFileDataset
o = NMTSeparateFileDataset(config=None, logger_name=None)
feature_files = [] # your files
label_files = []
train_dataset = o.build_train_dataset(feature_files,label_files)
For DSSM task
from nlp_datasets import DSSMSameFileDataset
o = DSSMSameFileDataset(config=None, logger_name=None)
train_dataset = o.build_train_dataset(train_files=[])
from nlp_datasets import DSSMSeparateFileDataset
o = DSSMSeparateFileDataset(config=None, logger_name=None)
query_files = []
doc_files = []
label_files = []
train_dataset = o.build_train_dataset(query_files, doc_files, label_files)
For MatchPyramid task
from nlp_datasets import MatchPyramidSameFileDataset
o = MatchPyramidSameFileDataset(config=None, logger_name=None)
train_dataset = o.build_train_dataset(train_files=[])
from nlp_datasets import MatchPyramidSeparateFilesDataset
o = MatchPyramidSeparateFilesDataset(config=None, logger_name=None)
query_files = []
doc_files = []
label_files = []
train_dataset = o.build_train_dataset(query_files, doc_files, label_files)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlp_datasets-0.0.7.tar.gz
(12.0 kB
view hashes)
Built Distribution
Close
Hashes for nlp_datasets-0.0.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc79abe9818a3395a5af6b4b9304c5c6ea2e6f3aee145a25fd9f44d23cef57ed |
|
MD5 | 3fecbae6bbf429efff191421873c8b93 |
|
BLAKE2b-256 | a4c694b8e5ce99d445cef2f080396e4f516fc8078518faf762a9fa91be78f260 |