Framework-Agnostic NLP Data Loader in Python
Project description
lineflow: Framework-Agnostic NLP Data Loader in Python
lineflow is a simple text dataset loader for NLP deep learning tasks.
- lineflow was designed to use in all deep learning frameworks.
- lineflow enables you to build pipelines.
- lineflow supports functional API and lazy evaluation.
Installation
To install lineflow, simply:
$ pip install lineflow
If you'd like to use lineflow with AllenNLP:
$ pip install "lineflow[allennlp]"
Also, if you'd like to use lineflow with torchtext:
$ pip install "lineflow[torchtext]"
Usage
lineflow.TextDataset expects line-oriented text files:
import lineflow as lf
def preprocess(x):
return x.split()
'''/path/to/text will look like below:
i 'm a line 1 .
i 'm a line 2 .
i 'm a line 3 .
'''
ds = lf.TextDataset('/path/to/text')
ds.first() # "i 'm a line 1 ."
ds[1] # "i 'm a line 2 ."
ds = ds.map(preprocess)
ds.first() # ["i", "'m", "a", "line", "1", "."]
ds = lf.TextDataset(['/path/to/text', '/path/to/text'])
ds.first() # ("i 'm a line 1 .", "i 'm a line 1 .")
ds = ds.map(lambda x: (x[0].split(), x[1].split()))
ds.first() # (["i", "'m", "a", "line", "1", "."], ["i", "'m", "a", "line", "1", "."])
lineflow with Deep Learning Frameworks
Use lineflow with AllenNLP:
import math
from allennlp.common.tqdm import Tqdm
from allennlp.data.vocabulary import Vocabulary
from allennlp.data.iterators import BucketIterator
from lineflow.datasets import Seq2SeqDataset
ds = Seq2SeqDataset(
source_file_path='/path/to/source',
target_file_path='/path/to/target'
).to_allennlp()
vocab = Vocabulary.from_instances(ds)
iterator = BucketIterator(sorting_keys=[('source_tokens', 'num_tokens')])
iterator.index_with(vocab)
num_batches = math.ceil(len(ds) / iterator._batch_size)
for batch in Tqdm.tqdm(iterator(train, num_epochs=1), total=num_batches):
... # Your training code here
You can find other examples here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
lineflow-0.2.2.tar.gz
(5.1 kB
view hashes)