MXNet Gluon NLP Toolkit

GluonNLP is a toolkit that enables easy text preprocessing, datasets
loading and neural models building to help you speed up your Natural
Language Processing (NLP) research.

- `Quick Start Guide <>`__
- `Resources <>`__


- GluonNLP is featured in:

- **AWS re:invent 2018 in Las Vegas, 2018-11-28**! Checkout `details <>`_.
- **KDD 2018 London, 2018-08-21, Apache MXNet Gluon tutorial**! Check out ****.


Make sure you have Python 2.7 or Python 3.6 and recent version of MXNet.
You can install ``MXNet`` and ``GluonNLP`` using pip:


pip install --pre --upgrade mxnet
pip install gluonnlp

Docs 📖

GluonNLP documentation is available at `our
website <>`__.


GluonNLP is a community that believes in sharing.

For questions, comments, and bug reports, `Github issues <>`__ is the best way to reach us.

We now have a new Slack channel `here <>`__.
(`register <>`__).

How to Contribute

GluonNLP community welcomes contributions from anyone!

There are lots of opportunities for you to become our `contributors <>`__:

- Ask or answer questions on `GitHub issues <>`__.
- Propose ideas, or review proposed design ideas on `GitHub issues <>`__.
- Improve the `documentation <>`__.
- Contribute bug reports `GitHub issues <>`__.
- Write new `scripts <>`__ to reproduce
state-of-the-art results.
- Write new `examples <>`__ to explain
key ideas in NLP methods and models.
- Write new `public datasets <>`__
(license permitting).
- Most importantly, if you have an idea of how to contribute, then do it!

For a list of open starter tasks, check `good first issues <>`__.

Also see our `contributing
guide <>`__ on simple how-tos,
contribution guidelines and more.


Check out how to use GluonNLP for your own research or projects.

If you are new to Gluon, please check out our `60-minute crash course

For getting started quickly, refer to notebook runnable examples at
`Examples. <>`__

For advanced examples, check out our
`Scripts. <>`__

For experienced users, check out our
`API Notes <>`__.

Quick Start Guide

`Dataset Loading <>`__

Load the Wikitext-2 dataset, for example:

.. code:: python

>>> import gluonnlp as nlp
>>> train ='train')
>>> train[0][0:5]
['=', 'Valkyria', 'Chronicles', 'III', '=']

`Vocabulary Construction <>`__

Build vocabulary based on the above dataset, for example:

.. code:: python

>>> vocab = nlp.Vocab([0]))
>>> vocab
Vocab(size=33280, unk="<unk>", reserved="['<pad>', '<bos>', '<eos>']")

`Neural Models Building <>`__

From the models package, apply a Standard RNN language model to the
above dataset:

.. code:: python

>>> model = nlp.model.language_model.StandardRNN('lstm', len(vocab),
... 200, 200, 2, 0.5, True)
>>> model
(embedding): HybridSequential(
(0): Embedding(33280 -> 200, float32)
(1): Dropout(p = 0.5, axes=())
(encoder): LSTM(200 -> 200.0, TNC, num_layers=2, dropout=0.5)
(decoder): HybridSequential(
(0): Dense(200 -> 33280, linear)

`Word Embeddings Loading <>`__

For example, load a GloVe word embedding, one of the state-of-the-art
English word embeddings:

.. code:: python

>>> glove = nlp.embedding.create('glove', source='glove.6B.50d')
# Obtain vectors for 'baby' in the GloVe word embedding
>>> type(glove['baby'])
<class 'mxnet.ndarray.ndarray.NDArray'>
>>> glove['baby'].shape

