A dataset utils repository based on tf.data. For tensorflow 2.x only!
Project description
datasets
A dataset utils repository based on tf.data
. For tensorflow>=2.0 only!
Requirements
- python 3.6
- tensorflow>=2.0
Installation
pip install nlp-datasets
Usage
seq2seq models
from nlp_datasets import XYSameFileDataset
from nlp_datasets import SpaceTokenizer
tokenizer = SpaceTokenizer()
corpus_files = ['/path/to/corpus']
tokenizer.build_from_corpus(corpus_files, max_vocab_size=10000)
dataset = XYSameFileDataset(x_tokenizer=tokenizer, y_tokenizer=tokenizer, config=None)
train_files = ['/path/to/train/files']
train_dataset = dataset.build_train_dataset(train_files=train_files)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlp_datasets-1.2.1.tar.gz
(7.7 kB
view hashes)
Built Distribution
Close
Hashes for nlp_datasets-1.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d0b930fc9cb08f4b14905902073f32a3ba218664173b2364d7dafd6c5cdc90d4 |
|
MD5 | 9e7df63cad71d03c8efcfea65faa17e0 |
|
BLAKE2b-256 | a1d4800e32a1f4cde4eccf4c2fe453641425dfe51d5116b5a77a04fea2d06e01 |