Skip to main content

Data stream generator for *Certainty-based Domain Selection Framework for TinyML Devices* paper.

Project description

Conditional Evidence Stream Generator

Data stream generator for Certainty-based Domain Selection Framework for TinyML Devices paper.

Installation guide

Installation is pretty simple. Either do it by make install in the main directory of this repository, or use pip for current stable version:

pip install cesg

Processing example

from sklearn.datasets import make_classification
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import balanced_accuracy_score
import cesg

# Define parameters
n_cycles = 3
n_chunks = 1000
chunk_size = 200
random_state = 1410
n_concepts = 500
modes = {
    'instant': {'mode': 'instant'},
    'linear': {'mode': 'linear'},
    'normal': {'mode': 'normal', 'sigma': 1},
}

# Prepare data
X, y = make_classification(n_samples=10000)

# Transform to components
X_pca = cesg.utils.normalized_PCA(X)

# Prepare factor
factor = cesg.utils.mix_to_factor(X_pca)

# Prepare condition map
condition_map = cesg.utils.make_condition_map(n_cycles=n_cycles,
                                   n_concepts=n_concepts,
                                   factor=factor,
                                   factor_range=(0.1,0.9))

# Calculate concept proba
concept_probabilities = cesg.concepts.concept_proba(n_concepts=n_concepts,
                            n_chunks=n_chunks,
                            normalize=True,
                            **modes['normal'])

# Initialize stream
stream = cesg.ConditionalEvidenceStream(X, y,
                                        condition_map.T,
                                        concept_probabilities,
                                        chunk_size=chunk_size,
                                        fragile=False,
                                        random_state=random_state)

# Iterate stream and report scores
clf = MLPClassifier()
scores = []

while chunk := stream.get_chunk():
    X, y = chunk

    if stream.chunk_idx > 1:
        y_pred = clf.predict(X)
        score = balanced_accuracy_score(y, y_pred)
        
        scores.append(score)
    
    clf.partial_fit(X, y, classes=stream.classes_)
    
print(scores)

Generation procedure

The streams were synthesized using an original generator based on the conditional evidence. At the input of the stream synthesis procedure, we have a stationary data set $DS$.

The first processing step is to determine the $F$ factor of the set, being a value in the range $0-1$, correlated with the difficulty of the object and determined for each object from the DS data set. To estmimate the $F$ factor:

  1. Transform $DS$ to its components $DS'$, using Principal Component Analysis, leaving 80% of the explained variance and standardizing the result.
  2. Model a Gaussian Mixture for $DS'$ with an assumption of 10 mixture components, assuming that each component has its own single variance.
  3. Estimate the density of the Gaussian Mixture distribution for each point of DS'. It is important to remember that support is estimated for each component of the mixture.
  4. Quantile-normalize the obtained density to a uniform distribution along the object axis -- independently in each component.
  5. Flatten the obtained representation with the sum of components and perform another quantile normalization to uniform distribution, so that for each point from the original set its mapping to the $F$ factor is obtained.

Having the vector of factors $F$, it is possible to proceed to determine the conditional map $CM$. It informs generator about the availability of each $DS$ object for each metaconcept building the data stream. Here it is possible to configure the number of metaconcepts (m), the number of difficulty oscillation cycles (c) and the thresholding range of the difficulty factor (r). To obtain the conditional map $CM$:

  1. Build a condition basis vector constituting an interval-normalized (0-1) sampling of the sinusoid at m points in the period from 0 to 2 Pi c. Scale the result to the thresholding range r.
  2. Calculate a conditional map $CM$ by equating the condition basis vector to the vetor $F$, so as to obtain a logical matrix informing whether the F factor of a given object exceeds the metaconcept threshold value.

The final, third component of processing metadata is the metaconcept probability map (CP). It informs generator about the probability of selecting an object from a given metaconcept in a given batch of the generated stream. It is calculated according to instant, linear or normal dynamics, in accordance with the standard procedure for generating data streams.

To establish a data stream, it is necessary to pass $DS$, $F$, $CM$, and $CP$ to the ConditionalEvidenceStream control object. It is responsible for using in each subsequent batch only objects allowed for processing in accordance with the CM conditional map for a specific batch described in the CP.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cesg-1.0.0.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

cesg-1.0.0-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file cesg-1.0.0.tar.gz.

File metadata

  • Download URL: cesg-1.0.0.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for cesg-1.0.0.tar.gz
Algorithm Hash digest
SHA256 408d635dcc01a959483bfb3e27abdc02509dc206c931f8bf29d00500a819b806
MD5 a607e56d6ca65b6321cbd4e445ccec78
BLAKE2b-256 09c4876f2c461c5a0ada7bbe01b6aa266e8d70b8436063acfb09f361bb23226f

See more details on using hashes here.

File details

Details for the file cesg-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: cesg-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for cesg-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c261d9ae32523972203c99948accccc3c1d24d677899936987a841fc71e1223a
MD5 fc29782e9f0b6d306d06d5ff7e25f51b
BLAKE2b-256 9a5d0dbfecff341ed35ad4261726a357c1d130597894679c74383bfbbecd2c76

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page