Skip to main content

End-to-end NLP package for seamless integration of Pandas Series, DataFrame and Keras model

Project description

Convectors: build end-to-end NLP pipelines

Inspired by the Keras syntax, Convectors allows you to build NLP pipelines by adding different processing Layers. Fully compatible with pandas and Keras, it can either process list or pandas series on the fly, or apply processing to a whole DataFrame by using columns as inputs and outputs. Tensorflow's Keras models can be added as a layer, embedded and saved within a larger end-to-end NLP model.

pip install convectors

Simple classification example

In this basic example, we create an NLP pipeline consisting of: tokenizer, Snowball stemmer, TfIdf vectorizer, SVD and a linear SVM classifier.

from convectors.layers import Tokenize, Snowball, TfIdf, SVD, SVM
from sklearn.datasets import fetch_20newsgroups

# load data
training_set = fetch_20newsgroups(subset="train")
testing_set = fetch_20newsgroups(subset="test")

# create encoder model
nlp = Tokenize(stopwords=["en"])
nlp += Snowball(lang="english")
nlp += TfIdf(max_features=20000, max_df=.3)
nlp += SVD(n_components=400)
nlp += SVM()

# fit and train model with just one call
nlp(training_set.data,
    y=training_set.target)

# use trained model for inference
y_pred = nlp(testing_set.data)
y_true = testing_set.target

# compute accuracy
print((y_true == y_pred).mean())

# persist model for future usage
nlp.save("model.p")

Seamless Keras integration

from sklearn.datasets import fetch_20newsgroups
from convectors.layers import Tokenize, TfIdf, SVD, Keras, OneHot
from convectors import load_model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# load data
training_set = fetch_20newsgroups(subset="train")
testing_set = fetch_20newsgroups(subset="test")

# create one hot encoder for categorical data
one_hot = OneHot(to_categorical=True)
# fit and return data
y = one_hot(training_set.target)

# create encoder model
nlp = Tokenize(stopwords=["en"])
nlp += TfIdf(max_features=20000, max_df=.3)
nlp += SVD(n_components=100)

# create keras model
model = Sequential()
model.add(Dense(100, activation="tanh"))
model.add(Dense(100, activation="tanh"))
model.add(Dense(one_hot.n_features, activation="softmax"))
model.compile("adam", "categorical_crossentropy", metrics=["accuracy"])
# add Keras model to NLP model
nlp += Keras(model=model, epochs=6, batch_size=200, validation_split=.1)
# add one_hot decoder to NLP model
nlp += one_hot.get_decoder()

# this will fit the model !
X = nlp(training_set.data, y=y)

# save model
nlp.save("model.p")

# retrieve model as well as the Keras neural network in it
nlp = load_model("model.p")
y_pred = nlp(testing_set.data)
y_true = testing_set.target
print((y_pred == y_true).mean())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convectors-0.0.13.tar.gz (2.4 MB view hashes)

Uploaded Source

Built Distribution

convectors-0.0.13-py3-none-any.whl (2.4 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page