Skip to main content

End-to-end NLP package for seamless integration of Pandas Series, DataFrame and Keras model

Project description

Convectors: build end-to-end NLP pipelines

Inspired by the Keras syntax, Convectors allows you to build NLP pipelines by adding different processing Layers. Fully compatible with pandas and Keras, it can either process list or pandas series on the fly, or apply processing to a whole DataFrame by using columns as inputs and outputs. Tensorflow's Keras models can be added as a layer, embedded and saved within a larger end-to-end NLP model.

pip install convectors

Simple classification example

In this basic example, we create an NLP pipeline for a sequence classification task:

from convectors import load_model
from convectors.layers import Argmax, Keras, Sequence, Tokenize
from sklearn.datasets import fetch_20newsgroups
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.models import Sequential

# load data
training_set = fetch_20newsgroups(subset="train")
testing_set = fetch_20newsgroups(subset="test")

# create encoder model
encoder = Tokenize(stopwords=["en"])
encoder += Sequence(max_features=20000, pad=True, maxlen=200)

# get and transform training data
X_train = encoder(training_set.data)  # fit and transform
y_train = training_set.target  # get training data

# infer number of features and classes
N_FEATURES = encoder["Sequence"].n_features + 1
N_CLASSES = y_train.max() + 1
EMBEDDING_DIM = 32

# create keras model and fit
model = Sequential()
model.add(Embedding(N_FEATURES, EMBEDDING_DIM, mask_zero=True))
model.add(LSTM(32, activation="tanh", return_sequences=False))
model.add(Dense(32, activation="tanh"))
model.add(Dense(N_CLASSES, activation="softmax"))
model.compile("nadam", "sparse_categorical_crossentropy", metrics=["accuracy"])
model.fit(X_train, y_train, epochs=1, batch_size=800)

# once learned, add Keras model
encoder += Keras(model=model, trained=True)
encoder += Argmax()
encoder.verbose = False  # turn verbosity off

# for model persistence:
encoder.save("model.p")
encoder = load_model("model.p")

# predict for new data
y_pred = encoder(testing_set.data)
y_true = testing_set.target
# print accuracy
print((y_pred == y_true).mean())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convectors-0.1.2.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

convectors-0.1.2-py3-none-any.whl (3.1 MB view details)

Uploaded Python 3

File details

Details for the file convectors-0.1.2.tar.gz.

File metadata

  • Download URL: convectors-0.1.2.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1

File hashes

Hashes for convectors-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e19b7e542b4411d2983c30f6aad2461818fa50545539e36dd181bed7a57778ee
MD5 b4c980cf80b613bf0bc6b9813dd28bbc
BLAKE2b-256 10246feeb11dde9bdfee7304438137c3916cde11e0d34494f0dd4a5d0d654617

See more details on using hashes here.

File details

Details for the file convectors-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: convectors-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 3.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1

File hashes

Hashes for convectors-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7b547fdcf604a17b7752cdfbf4503b7ebd55e50d808ccb45c334452321f8e165
MD5 8d9900d271d64fc3d07d07c88a1ed666
BLAKE2b-256 eb78234c71e45386286561e969eea45fdd89187785e569997e3bf51ada67633e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page