streamndr

Stream Novelty Detection for River

These details have not been verified by PyPI

Project links

Homepage

Project description

PyPI - Downloads

Stream Novelty Detection for River (StreamNDR) is a Python library for online novelty detection. StreamNDR aims to enable novelty detection in data streams for Python. It is based on the river API and is currently in early stage of development. Contributors are welcome.

📚 Documentation

StreamNDR implements in Python various algorithms for novelty detection that have been proposed in the literature. It follows river implementation and format. At this stage, the following algorithms are implemented:

MINAS [1]
ECSMiner [2]
ECSMiner-WF (Version of ECSMiner [2] without feedback, as proposed in [1])
ECHO [3]

Full documentation is available here.

🛠 Installation

Note: StreamNDR is intended to be used with Python 3.6 or above and requires the package ClusOpt-Core which requires a C/C++ compiler (such as gcc) and the Boost.Thread library to build. To install the Boost.Thread library on Debian systems, the following command can be used:

sudo apt install libboost-thread-dev

The package can be installed simply with pip :

pip install streamndr

⚡️ Quickstart

As a quick example, we'll train three models (MINAS, ECSMiner-WF, and ECHO) to classify a synthetic dataset created using RandomRBF. The models are trained on only two of the four generated classes ([0,1]) and will try to detect the other classes ([2,3]) as novelty patterns in the dataset in an online fashion.

Let's first generate the dataset.

import numpy as np
from river.datasets import synth

ds = synth.RandomRBF(seed_model=42, seed_sample=42, n_classes=4, n_features=5, n_centroids=10)

offline_size = 1000
online_size = 5000
X_train = []
y_train = []
X_test = []
y_test = []

for x,y in ds.take(10*(offline_size+online_size)):
    
    #Create our training data (known classes)
    if len(y_train) < offline_size:
        if y == 0 or y == 1: #Only showing two first classes in the training set
            X_train.append(np.array(list(x.values())))
            y_train.append(y)
    
    #Create our online stream of data
    elif len(y_test) < online_size:
        X_test.append(x)
        y_test.append(y)
        
    else:
        break

X_train = np.array(X_train)
y_train = np.array(y_train)

MINAS

Let's train our MINAS model on the offline (known) data.

from streamndr.model import Minas
clf = Minas(kini=100, cluster_algorithm='clustream', 
            window_size=600, threshold_strategy=1, threshold_factor=1.1, 
            min_short_mem_trigger=100, min_examples_cluster=20, verbose=1, random_state=42)

clf.learn_many(np.array(X_train), np.array(y_train)) #learn_many expects numpy arrays or pandas dataframes

Let's now test our algorithm in an online fashion, note that our unsupervised clusters are automatically updated with the call to predict_one.

from streamndr.metrics import ConfusionMatrixNovelty, MNew, FNew, ErrRate

known_classes = [0,1]

conf_matrix = ConfusionMatrixNovelty(known_classes)
m_new = MNew(known_classes)
f_new = FNew(known_classes)
err_rate = ErrRate(known_classes)

i = 1
for x, y_true in zip(X_test, y_test):

    y_pred = clf.predict_one(x) #predict_one takes python dictionaries as per River API
    
    if y_pred is not None: #Update our metrics
        conf_matrix.update(y_true, y_pred[0])
        m_new.update(y_true, y_pred[0])
        f_new.update(y_true, y_pred[0])
        err_rate.update(y_true, y_pred[0])


    #Show progress
    if i % 100 == 0:
        print(f"{i}/{len(X_test)}")
    i += 1

Let's look at the results, of course, the hyperparameters of the model can be tuned to get better results.

#print(conf_matrix) #Shows the confusion matrix of the given problem, can be very wide due to one class being detected as multiple Novelty Patterns
print(m_new) #Percentage of novel class instances misclassified as known.
print(f_new) #Percentage of known classes misclassified as novel.
print(err_rate) #Total misclassification error percentage

MNew: 17.15%
FNew: 40.11%
ErrRate: 36.80%

ECSMiner-WF

Let's train our model on the offline (known) data.

from streamndr.model import ECSMinerWF
clf = ECSMinerWF(K=50, min_examples_cluster=10, verbose=1, random_state=42, ensemble_size=7, init_algorithm="kmeans")
clf.learn_many(np.array(X_train), np.array(y_train))

Once again, let's use our model in an online fashion.

conf_matrix = ConfusionMatrixNovelty(known_classes)
m_new = MNew(known_classes)
f_new = FNew(known_classes)
err_rate = ErrRate(known_classes)

for x, y_true in zip(X_test, y_test):

    y_pred = clf.predict_one(x) #predict_one takes python dictionaries as per River API

    if y_pred is not None: #Update our metrics
        conf_matrix.update(y_true, y_pred[0])
        m_new.update(y_true, y_pred[0])
        f_new.update(y_true, y_pred[0])
        err_rate.update(y_true, y_pred[0])

#print(conf_matrix) #Shows the confusion matrix of the given problem, can be very wide due to one class being detected as multiple Novelty Patterns
print(m_new) #Percentage of novel class instances misclassified as known.
print(f_new) #Percentage of known classes misclassified as novel.
print(err_rate) #Total misclassification error percentage

MNew: 60.93%
FNew: 26.78%
ErrRate: 39.40%

ECHO

Let's train our ECHO model on the offline (known) data. Note that ECHO requires the true label during the online phase.

from streamndr.model import Echo
clf = Echo(K=50, min_examples_cluster=10, verbose=1, random_state=42, ensemble_size=7, W=500, tau=0.9, init_algorithm="kmeans")
clf.learn_many(np.array(X_train), np.array(y_train))

Once again, let's use our model in an online fashion.

conf_matrix = ConfusionMatrixNovelty(known_classes)
m_new = MNew(known_classes)
f_new = FNew(known_classes)
err_rate = ErrRate(known_classes)

for x, y_true in zip(X_test, y_test):

    y_pred = clf.predict_one(x, y_true) #predict_one takes a python dictionary and the true label

    if y_pred is not None: #Update our metrics
        conf_matrix.update(y_true, y_pred[0])
        m_new.update(y_true, y_pred[0])
        f_new.update(y_true, y_pred[0])
        err_rate.update(y_true, y_pred[0])

#print(conf_matrix) #Shows the confusion matrix of the given problem, can be very wide due to one class being detected as multiple Novelty Patterns
print(m_new) #Percentage of novel class instances misclassified as known.
print(f_new) #Percentage of known classes misclassified as novel.
print(err_rate) #Total misclassification error percentage

MNew: 24.20%
FNew: 16.16%
ErrRate: 22.74%

Special Thanks

Special thanks goes to Vítor Bernardes, from which some of the code for MINAS is based on their implementation.

💬 References

[1] de Faria, E.R., Ponce de Leon Ferreira Carvalho, A.C. & Gama, J. MINAS: multiclass learning algorithm for novelty detection in data streams. Data Min Knowl Disc 30, 640–680 (2016). https://doi.org/10.1007/s10618-015-0433-y

[2] M. Masud, J. Gao, L. Khan, J. Han and B. M. Thuraisingham, "Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints," in IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 6, pp. 859-874, June 2011, doi: 10.1109/TKDE.2010.61.

[3] A. Haque, L. Khan, M. Baron, B. Thuraisingham and C. Aggarwal, "Efficient handling of concept drift and concept evolution over stream data," 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, Finland, 2016, pp. 481-492, doi: 10.1109/ICDE.2016.7498264.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.0

Mar 1, 2026

0.1.6

Feb 24, 2024

0.1.5

Nov 30, 2023

0.1.4

Mar 21, 2023

0.1.3

Mar 3, 2023

0.1.2

Feb 28, 2023

0.1.0

Feb 28, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streamndr-0.2.0.tar.gz (32.2 kB view details)

Uploaded Mar 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

streamndr-0.2.0-py3-none-any.whl (41.0 kB view details)

Uploaded Mar 1, 2026 Python 3

File details

Details for the file streamndr-0.2.0.tar.gz.

File metadata

Download URL: streamndr-0.2.0.tar.gz
Upload date: Mar 1, 2026
Size: 32.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for streamndr-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`069875d375b3fe22cc043dac85be0f20a328447cb26f23294a5365ea51ece984`
MD5	`e8ff349dd11de1ec5a20ca8b10c82b85`
BLAKE2b-256	`f0726963cf2d54446f2a4f544a428beb881baa523427f30606219c5a5496afad`

See more details on using hashes here.

File details

Details for the file streamndr-0.2.0-py3-none-any.whl.

File metadata

Download URL: streamndr-0.2.0-py3-none-any.whl
Upload date: Mar 1, 2026
Size: 41.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for streamndr-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`28a4d6ffa951c1ec10ac224c1c7982af2cf82946eecdf9943d7717348d7f7ec3`
MD5	`19af8ee040dd73b989077c9835ebbf42`
BLAKE2b-256	`7aec1972e074c790bb0e8b2fe6c10c8a1ea975ecb2705957bb5b9aff560f467e`

See more details on using hashes here.

streamndr 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

📚 Documentation

🛠 Installation

⚡️ Quickstart

MINAS

ECSMiner-WF

ECHO

Special Thanks

💬 References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes