A dataset utils repository. For tensorflow 2.x only!
Project description
datasets
A dataset utils repository. For tensorflow>=2.0.0b only!
Requirements
- python 3.6
- tensorflow>=2.0.0b
Installation
pip install nlp-datasets
Contents
- Build dataset for seq2seq models. seq2seq_dataset.py
- Build dataset for NMT. nmt_dataset.py
- Build dataset for DSSM. dssm_dataset.py
- Build dataset for MatchPyramid. matchpyramid_dataset.py
Usage
For NMT task
from nlp_datasets import NMTSameFileDataset
o = NMTSameFileDataset(config=None, logger_name=None)
train_files = [] # your files
# train_dataset is an instance of tf.data.Dataset
train_dataset = o.build_train_dataset(train_files)
from nlp_datasets import NMTSeparateFileDataset
o = NMTSeparateFileDataset(config=None, logger_name=None)
feature_files = [] # your files
label_files = []
train_dataset = o.build_train_dataset(feature_files,label_files)
For DSSM task
from nlp_datasets import DSSMSameFileDataset
o = DSSMSameFileDataset(config=None, logger_name=None)
train_dataset = o.build_train_dataset(train_files=[])
from nlp_datasets import DSSMSeparateFileDataset
o = DSSMSeparateFileDataset(config=None, logger_name=None)
query_files = []
doc_files = []
label_files = []
train_dataset = o.build_train_dataset(query_files, doc_files, label_files)
For MatchPyramid task
from nlp_datasets import MatchPyramidSameFileDataset
o = MatchPyramidSameFileDataset(config=None, logger_name=None)
train_dataset = o.build_train_dataset(train_files=[])
from nlp_datasets import MatchPyramidSeparateFilesDataset
o = MatchPyramidSeparateFilesDataset(config=None, logger_name=None)
query_files = []
doc_files = []
label_files = []
train_dataset = o.build_train_dataset(query_files, doc_files, label_files)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
naivenmt_datasets-0.0.7.tar.gz
(12.0 kB
view details)
Built Distribution
File details
Details for the file naivenmt_datasets-0.0.7.tar.gz
.
File metadata
- Download URL: naivenmt_datasets-0.0.7.tar.gz
- Upload date:
- Size: 12.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 704debe74263a3e42ea7d71dabcc5134002638e933d299eca4ce3e45c7547266 |
|
MD5 | 8c406e280363e660078dc1d569542253 |
|
BLAKE2b-256 | 81abaec3f7623a547a27ac1af7ad1da0fa1fb059e8f171e613ef8a1332873a70 |
File details
Details for the file naivenmt_datasets-0.0.7-py3-none-any.whl
.
File metadata
- Download URL: naivenmt_datasets-0.0.7-py3-none-any.whl
- Upload date:
- Size: 27.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ae339aab797486eb8285469e5c2c5d34f9599777783e1f6f075a53f4f1d8947 |
|
MD5 | 59bfc9a4fc1cca8e060986111d4bd8a2 |
|
BLAKE2b-256 | e8da5a6fd3efdc41e7c3ab9e62b79819a03b21d82998ac6e3b899e83b5d5bcd1 |