Skip to main content

No project description provided

Project description

opensubtitles-dataloader

PyPI version

Download, preprocess and use sentences from the OpenSubtitles v2018 dataset without ever needing to load all of it into memory.

Download

See possible languages here.

opensubtitles-download en

Load tokenized version.

opensubtitles-download en --token

Use in Python

Load

opensubtites_dataset = OpenSubtitlesDataset('en')

Load only the first 1 million lines.

opensubtites_dataset = OpenSubtitlesDataset('en', first_n_lines=1_000_000)

Group sentences into groups of 5.

opensubtites_dataset = OpenSubtitlesDataset('en', 5)

Group sentences into groups ranging from 2 to 5.

opensubtites_dataset = OpenSubtitlesDataset('en', (2,5))

Split sentences using "\n".

opensubtites_dataset = OpenSubtitlesDataset('en', delimiter="\n")

Do preprocessing.

opensubtites_dataset = OpenSubtitlesDataset('en', preprocess_function=my_preprocessing_function)

Split for Training

train, valid, test = opensubtites_dataset.split()

Set the fractions of the original dataset.

train, valid, test = opensubtites_dataset.split([0.7, 0.15, 0.15])

Use a seed.

train, valid, test = opensubtites_dataset.split(seed=42)

Access

index.

train, valid, text = OpenSubtitlesDataset('en').splits()
train[20_000]

pytorch.

from torch.utils.data import DataLoader
train, valid, text = OpenSubtitlesDataset('en').splits()
train_loader = DataLoader(train, batch_size=16)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opensubtitles-dataloader-0.1.4.tar.gz (4.5 kB view hashes)

Uploaded Source

Built Distribution

opensubtitles_dataloader-0.1.4-py3-none-any.whl (5.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page