Skip to main content

A little library for text analysis with RNNs.

Project description

A little library for text analysis with RNNs.

Warning: very alpha, work in progress.

Install

via Github (version under active development)

git clone http://github.com/IndicoDataSolutions/passage.git
python setup.py develop

or via pip

sudo pip install passage

Example

Using Passage to do binary classification of text, this example:

  • Tokenizes some training text, converting it to a format Passage can use.

  • Defines the model’s structure as a list of layers.

  • Creates the model with that structure and a cost to be optimized.

  • Trains the model for one iteration over the training text.

  • Uses the model and tokenizer to predict on new text.

  • Saves and loads the model.

from passage.preprocessing import Tokenizer
from passage.layers import Embedding, GatedRecurrent, Dense
from passage.models import RNN
from passage.utils import save, load

tokenizer = Tokenizer()
train_tokens = tokenizer.fit_transform(train_text)

layers = [
    Embedding(size=128, n_features=tokenizer.n_features),
    GatedRecurrent(size=128),
    Dense(size=1, activation='sigmoid')
]

model = RNN(layers=layers, cost='BinaryCrossEntropy')
model.fit(train_tokens, train_labels)

model.predict(tokenizer.transform(test_text))
save(model, 'save_test.pkl')
model = load('save_test.pkl')

Where:

  • train_text is a list of strings [‘hello world’, ‘foo bar’]

  • train_labels is a list of labels [0, 1]

  • test_text is another list of strings

Datasets

Without sizeable datasets RNNs have difficulty achieving results better than traditional sparse linear models. Below are a few datasets that are appropriately sized, useful for experimentation. Hopefully this list will grow over time, please feel free to propose new datasets for inclusion through either an issue or a pull request.

**Note**: None of these datasets were created by indico, not should their inclusion here indicate any kind of endorsement

Blogger Dataset: http://www.cs.biu.ac.il/~koppel/blogs/blogs.zip (Age and gender data)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

passage-0.2.4.tar.gz (9.7 kB view details)

Uploaded Source

File details

Details for the file passage-0.2.4.tar.gz.

File metadata

  • Download URL: passage-0.2.4.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for passage-0.2.4.tar.gz
Algorithm Hash digest
SHA256 45f9113b3e5b5fe3f1452888ceacde99009a07b2e564f74fe2f0117987d723a3
MD5 3496f6e1226d11c39ee8af23dfc3d224
BLAKE2b-256 6a5b65dbad95c7195954f20a9f88d406274bcecdf9d373f4387097a1a7acb69d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page