A dataset utils repository. For tensorflow 2.x only!
Project description
datasets
A dataset utils repository. For tensorflow>=2.0.0b only!
Requirements
- python 3.6
- tensorflow>=2.0.0b
Installation
pip install nlp-datasets
Contents
- Build dataset for seq2seq models. seq2seq_dataset.py
- Build dataset for NMT. nmt_dataset.py
- Build dataset for DSSM. dssm_dataset.py
- Build dataset for MatchPyramid. matchpyramid_dataset.py
Usage
For NMT task
from nlp_datasets import NMTSameFileDataset
o = NMTSameFileDataset(config=None, logger_name=None)
train_files = [] # your files
# train_dataset is an instance of tf.data.Dataset
train_dataset = o.build_train_dataset(train_files)
from nlp_datasets import NMTSeparateFileDataset
o = NMTSeparateFileDataset(config=None, logger_name=None)
feature_files = [] # your files
label_files = []
train_dataset = o.build_train_dataset(feature_files,label_files)
For DSSM task
from nlp_datasets import DSSMSameFileDataset
o = DSSMSameFileDataset(config=None, logger_name=None)
train_dataset = o.build_train_dataset(train_files=[])
from nlp_datasets import DSSMSeparateFileDataset
o = DSSMSeparateFileDataset(config=None, logger_name=None)
query_files = []
doc_files = []
label_files = []
train_dataset = o.build_train_dataset(query_files, doc_files, label_files)
For MatchPyramid task
from nlp_datasets import MatchPyramidSameFileDataset
o = MatchPyramidSameFileDataset(config=None, logger_name=None)
train_dataset = o.build_train_dataset(train_files=[])
from nlp_datasets import MatchPyramidSeparateFilesDataset
o = MatchPyramidSeparateFilesDataset(config=None, logger_name=None)
query_files = []
doc_files = []
label_files = []
train_dataset = o.build_train_dataset(query_files, doc_files, label_files)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
naivenmt_datasets-0.0.7.tar.gz
(12.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file naivenmt_datasets-0.0.7.tar.gz.
File metadata
- Download URL: naivenmt_datasets-0.0.7.tar.gz
- Upload date:
- Size: 12.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
704debe74263a3e42ea7d71dabcc5134002638e933d299eca4ce3e45c7547266
|
|
| MD5 |
8c406e280363e660078dc1d569542253
|
|
| BLAKE2b-256 |
81abaec3f7623a547a27ac1af7ad1da0fa1fb059e8f171e613ef8a1332873a70
|
File details
Details for the file naivenmt_datasets-0.0.7-py3-none-any.whl.
File metadata
- Download URL: naivenmt_datasets-0.0.7-py3-none-any.whl
- Upload date:
- Size: 27.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ae339aab797486eb8285469e5c2c5d34f9599777783e1f6f075a53f4f1d8947
|
|
| MD5 |
59bfc9a4fc1cca8e060986111d4bd8a2
|
|
| BLAKE2b-256 |
e8da5a6fd3efdc41e7c3ab9e62b79819a03b21d82998ac6e3b899e83b5d5bcd1
|