Skip to main content

Text utilities and datasets for PyTorch

Project description

Supporting Rapid Prototyping with a Deep Learning NLP Toolkit   Tweet

PyTorch-NLP, or torchnlp for short, is a library of neural network layers, text processing modules and datasets designed to accelerate Natural Language Processing (NLP) research.

Join our community, add datasets and neural network layers! Chat with us on Gitter and join the Google Group, we're eager to collaborate with you.

PyPI - Python Version Codecov Downloads Documentation Status Build Status

Logo by Chloe Yeo

Installation

Make sure you have Python 3.6+ and PyTorch 1.0+. You can then install pytorch-nlp using pip:

pip install pytorch-nlp

Or to install the latest code via:

pip install git+https://github.com/PetrochukM/PyTorch-NLP.git

Docs 📖

The complete documentation for PyTorch-NLP is available via our ReadTheDocs website.

Basics

Add PyTorch-NLP to your project by following one of the common use cases:

Load a Dataset

Load the IMDB dataset, for example:

from torchnlp.datasets import imdb_dataset

# Load the imdb training dataset
train = imdb_dataset(train=True)
train[0]  # RETURNS: {'text': 'For a movie that gets..', 'sentiment': 'pos'}

Apply Neural Networks Layers

For example, from the neural network package, apply state-of-the-art LockedDropout:

import torch
from torchnlp.nn import LockedDropout

input_ = torch.randn(6, 3, 10)
dropout = LockedDropout(0.5)

# Apply a LockedDropout to `input_`
dropout(input_) # RETURNS: torch.FloatTensor (6x3x10)

Encode Text

Tokenize and encode text as a tensor. For example, a WhitespaceEncoder breaks text into terms whenever it encounters a whitespace character.

from torchnlp.encoders.text import WhitespaceEncoder

# Create a `WhitespaceEncoder` with a corpus of text
encoder = WhitespaceEncoder(["now this ain't funny", "so don't you dare laugh"])

# Encode and decode phrases
encoder.encode("this ain't funny.") # RETURNS: torch.Tensor([6, 7, 1])
encoder.decode(encoder.encode("This ain't funny.")) # RETURNS: "this ain't funny."

Load Word Vectors

For example, load FastText, state-of-the-art English word vectors:

from torchnlp.word_to_vector import FastText

vectors = FastText()
# Load vectors for any word as a `torch.FloatTensor`
vectors['hello']  # RETURNS: [torch.FloatTensor of size 100]

Compute Metrics

Finally, compute common metrics such as the BLEU score.

from torchnlp.metrics import get_moses_multi_bleu

hypotheses = ["The brown fox jumps over the dog 笑"]
references = ["The quick brown fox jumps over the lazy dog 笑"]

# Compute BLEU score with the official BLEU perl script
get_moses_multi_bleu(hypotheses, references, lowercase=True)  # RETURNS: 47.9

Help :question:

Maybe looking at longer examples may help you at examples/.

Need more help? We are happy to answer your questions via Gitter Chat

Contributing

We've released PyTorch-NLP because we found a lack of basic toolkits for NLP in PyTorch. We hope that other organizations can benefit from the project. We are thankful for any contributions from the community.

Contributing Guide

Read our contributing guide to learn about our development process, how to propose bugfixes and improvements, and how to build and test your changes to PyTorch-NLP.

Related Work

torchtext

torchtext and PyTorch-NLP differ in the architecture and feature set; otherwise, they are similar. torchtext and PyTorch-NLP provide pre-trained word vectors, datasets, iterators and text encoders. PyTorch-NLP also provides neural network modules and metrics. From an architecture standpoint, torchtext is object orientated with external coupling while PyTorch-NLP is object orientated with low coupling.

AllenNLP

AllenNLP is designed to be a platform for research. PyTorch-NLP is designed to be a lightweight toolkit.

Authors

Citing

If you find PyTorch-NLP useful for an academic publication, then please use the following BibTeX to cite it:

@misc{pytorch-nlp,
  author = {Petrochuk, Michael},
  title = {PyTorch-NLP: Rapid Prototyping with PyTorch Natural Language Processing (NLP) Tools},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/PetrochukM/PyTorch-NLP}},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytorch-nlp-0.4.1.tar.gz (49.6 kB view details)

Uploaded Source

Built Distribution

pytorch_nlp-0.4.1-py3-none-any.whl (82.3 kB view details)

Uploaded Python 3

File details

Details for the file pytorch-nlp-0.4.1.tar.gz.

File metadata

  • Download URL: pytorch-nlp-0.4.1.tar.gz
  • Upload date:
  • Size: 49.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for pytorch-nlp-0.4.1.tar.gz
Algorithm Hash digest
SHA256 bb99b7a5cb02cb8e0aac3e6685b5b96aa2c3746326decb6a52e8b8df3aec29e7
MD5 30f10bd580e8e1e3e545e0eaf5c58dd3
BLAKE2b-256 b0ae24e0a9dab242747fc6bab8ca67967c8b0a8c564c05b6d9ffc66bf464820f

See more details on using hashes here.

File details

Details for the file pytorch_nlp-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: pytorch_nlp-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 82.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for pytorch_nlp-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 03e13b01b875c8b4c4277a914a6dc1fdb5653459bccd4ab93a525e5cca7dd1f6
MD5 6e98b27c515cba2f27a181e1c672c574
BLAKE2b-256 dfaeb6d18c3f37da5a78e83701469e6153811f4b0ecb3f9387bb3e9a65ca48ee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page