Skip to main content

End-to-end NLP package for seamless integration of Pandas Series, DataFrame and Keras model

Project description

Convectors: build end-to-end NLP pipelines

Inspired by the Keras syntax, Convectors allows you to build NLP pipelines by adding different processing Layers. Fully compatible with pandas and Keras, it can either process list or pandas series on the fly, or apply processing to a whole DataFrame by using columns as inputs and outputs. Tensorflow's Keras models can be added as a layer, embedded and saved within a larger end-to-end NLP model.

pip install convectors

Simple classification example

In this basic example, we create an NLP pipeline for a sequence classification task:

from convectors import load_model
from convectors.layers import Argmax, Keras, Sequence, Tokenize
from sklearn.datasets import fetch_20newsgroups
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.models import Sequential

# load data
training_set = fetch_20newsgroups(subset="train")
testing_set = fetch_20newsgroups(subset="test")

# create encoder model
encoder = Tokenize(stopwords=["en"])
encoder += Sequence(max_features=20000, pad=True, maxlen=200)

# get and transform training data
X_train = encoder(training_set.data)  # fit and transform
y_train = training_set.target  # get training data

# infer number of features and classes
N_FEATURES = encoder["Sequence"].n_features + 1
N_CLASSES = y_train.max() + 1
EMBEDDING_DIM = 32

# create keras model and fit
model = Sequential()
model.add(Embedding(N_FEATURES, EMBEDDING_DIM, mask_zero=True))
model.add(LSTM(32, activation="tanh", return_sequences=False))
model.add(Dense(32, activation="tanh"))
model.add(Dense(N_CLASSES, activation="softmax"))
model.compile("nadam", "sparse_categorical_crossentropy", metrics=["accuracy"])
model.fit(X_train, y_train, epochs=1, batch_size=800)

# once learned, add Keras model
encoder += Keras(model=model, trained=True)
encoder += Argmax()
encoder.verbose = False  # turn verbosity off

# for model persistence:
encoder.save("model.p")
encoder = load_model("model.p")

# predict for new data
y_pred = encoder(testing_set.data)
y_true = testing_set.target
# print accuracy
print((y_pred == y_true).mean())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convectors-0.1.1.tar.gz (3.0 MB view hashes)

Uploaded Source

Built Distribution

convectors-0.1.1-py3-none-any.whl (3.1 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page