Skip to main content

A text classification engine using machine learning and designed as client-server architecture

Project description

OpenTC is a text classification engine using machine learning. It is designed as client-server architecture and uses python libraries scikit-learn and tensorflow for it’s machine learning algorithms. Currently following algorithms are supported:

  • Naive Bayes

  • Support Vector Machine

  • Convolutional Neural Network

In the future it will also support FastText from Facebookresearch.

The engine is running as a server listening on command and text to be classified. By default it listens on localhost port 3333, but it can be changed in the yaml configuration file.

OpenTC can be used for example for text classification (a demo website for this purpose is available online OpenTC demo), or for other purposes such as Data Leak Prevention (DLP). An example of implementation for the DLP has been created as ICAP Server: opentc-icap


  • Python 3.x

  • numpy

  • pyparsing

  • PyYAML

  • scikit-learn

  • scipy

  • tensorflow 1.x

How to use


Install the module using pip:

$ pip install opentc

or clone the repository

$ git clone
$ cd opentc
$ python install





The command line to train the application based on the datasets define in the configuration file. The result of the training (pre-trained data) can be used for the opentcd server.


$ python opentc -h
usage: opentc [-h] [-c CLASSIFIER] [-C CONFIGURATION_FILE] [-d DATASET]
              [-l LOG_CONFIGURATION_FILE]

optional arguments:
  -h, --help            show this help message and exit
  -c CLASSIFIER, --classifier CLASSIFIER
                        set classifier to use for the training (support
                        currently bayesian, svm or cnn)
                        set the configuration file
  -d DATASET, --dataset DATASET
                        set dataset to use for the training
                        set the log configuration file





The daemon listens for incoming connections on TCP port (default is 3333) and classify files or text string on demand. It reads a configuration file in the following order: ./opentc.yml, ~/.opentc/opentc.yml or /etc/opentc/opentc.yml.


Opentcd uses the configuration file opentc.yml to define allmost all possible configuration. Only few setup can be overridden in command line options.

List of arguments:

$ python opentcd -h
usage: opentcd [-h] [-a ADDRESS] [-C CONFIGURATION_FILE]
               [-l LOG_CONFIGURATION_FILE] [-p PORT] [-t TIMEOUT]

optional arguments:
  -h, --help            show this help message and exit
  -a ADDRESS, --address ADDRESS
                        define the address for the server
                        set the configuration file
                        set the log configuration file
  -p PORT, --port PORT  define the port number which the server uses to listen
  -t TIMEOUT, --timeout TIMEOUT
                        define the time out

Run it as background application:

$ python opentcd&
2017-05-02 13:33:22,276 - opentc.core.classifier.cnn_text - DEBUG - Load the checkpoint:
INFO:tensorflow:Restoring parameters from data/input/cnn_twenty_newsgroup_20170301_090000-all/checkpoints/model-2210
2017-05-02 13:33:23,899 - tensorflow - INFO - Restoring parameters
from data/input/cnn_twenty_newsgroup_20170301_090000-all/checkpoints/model-2210
2017-05-02 13:33:27,375 - __main__ - INFO - Server start
2017-05-02 13:33:28,019 - opentc.core.server - INFO - Server loop running in thread: Thread-1

datasets and pre-trained data

The configuration file defines the path to the datasets and pre-trained data. A pre-trained data for testing purpose can be downloaded from data, it is around 1.4GB. Just uncompress it and change the path to the pre-trained data in opentc.yml file accordingly.


The command uses a newline character as the delimiter. If opentcd doesn’t recognize the command, or the command doesn’t follow the requirements specified below, it will reply with an error message, but still wait for the next commands (this behaviour can be changed in the future).


Check the server’s state. It should reply with “PONG”.


Print the program version


Reload the engine


List the supported classifiers (at the moment there are three classifiers supported: Bayesian, Support Vector Machine and Convolutional Neural Network). It shows also the status of classifier, either True (enabled) or False (disabled).


Enabled or disabled the specific classifier


Classify text streams. It uses a new line character as delimiter for every sentences.


Classify file. It uses a new line character as delimiter for every sentences


Close the connection


  • Multilabel classification

  • Include FastText from Facebookresearch

  • Will use pyzmq and google’s protobuf to improve the protocol and network communication

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opentc-0.4.5.tar.gz (18.1 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page