Skip to main content

Collection of operational Machine Learning models and tools.

Project description

numalogic

Build codecov black License slack Release Version

Background

Numalogic is a collection of ML models and algorithms for operation data analytics and AIOps. At Intuit, we use Numalogic at scale for continuous real-time data enrichment including anomaly scoring. We assign an anomaly score (ML inference) to any time-series datum/event/message we receive on our streaming platform (say, Kafka). 95% of our data sets are time-series, and we have a complex flowchart to execute ML inference on our high throughput sources. We run multiple models on the same datum, say a model that is sensitive towards +ve sentiments, another more tuned towards -ve sentiments, and another optimized for neutral sentiments. We also have a couple of ML models trained for the same data source to provide more accurate scores based on the data density in our model store. An ensemble of models is required because some composite keys in the data tend to be less dense than others, e.g., forgot-password interaction is less frequent than a status check interaction. At runtime, for each datum that arrives, models are picked based on a conditional forwarding filter set on the data density. ML engineers need to worry about only their inference container; they do not have to worry about data movement and quality assurance.

Numalogic realtime training

For an always-on ML platform, the key requirement is the ability to train or retrain models automatically based on the incoming messages. The composite key built at per message runtime looks for a matching model, and if the model turns out to be stale or missing, an automatic retriggering is applied. The conditional forwarding feature of the platform improves the development velocity of the ML developer when they have to make a decision whether to forward the result further or drop it after a trigger request.

Key Features

  1. Ease of use: simple and efficient tools for predictive data analytics
  2. Reusability: all the functionalities can be re-used in various contexts
  3. Model selection: easy to compare, validate, fine-tune and choose the model that works best with each data set
  4. Data processing: readily available feature extraction, scaling, transforming and normalization tools
  5. Extensibility: adding your own functions or extending over the existing capabilities
  6. Model Storage: out-of-the-box support for MLFlow and support for other model ML lifecycle management tools

Use Cases

  1. Deployment failure detection
  2. System failure detection for node failures or crashes
  3. Fraud detection
  4. Network intrusion detection
  5. Forecasting on time series data

Getting Started

For set-up information and running your first pipeline using numalogic, please see our getting started guide.

Installation

Numalogic requires Python 3.8 or higher.

Prerequisites

Numalogic needs PyTorch and PyTorch Lightning to work. But since these packages are platform dependendent, they are not included in the numalogic package itself. Kindly install them first.

Numalogic supports pytorch versions 2.0.0 and above.

numalogic can be installed using pip.

pip install numalogic

If using mlflow for model registry, install using:

pip install numalogic[mlflow]

Build locally

  1. Install Poetry:
    curl -sSL https://install.python-poetry.org | python3 -
    
  2. To activate virtual env:
    poetry shell
    
  3. To install dependencies:
    poetry install --with dev,torch
    
    If extra dependencies are needed:
    poetry install --all-extras
    
  4. To run unit tests:
    make test
    
  5. To format code style using black and ruff:
    make lint
    
  6. Setup pre-commit hooks:
    pre-commit install
    

Contributing

We would love contributions in the numalogic project in one of the following (but not limited to) areas:

  • Adding new time series anomaly detection models
  • Making it easier to add user's custom models
  • Support for additional model registry frameworks

For contribution guildelines please refer here.

Resources

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numalogic-0.6.1.dev4.tar.gz (74.7 kB view details)

Uploaded Source

Built Distribution

numalogic-0.6.1.dev4-py3-none-any.whl (123.0 kB view details)

Uploaded Python 3

File details

Details for the file numalogic-0.6.1.dev4.tar.gz.

File metadata

  • Download URL: numalogic-0.6.1.dev4.tar.gz
  • Upload date:
  • Size: 74.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.12 Linux/6.2.0-1016-azure

File hashes

Hashes for numalogic-0.6.1.dev4.tar.gz
Algorithm Hash digest
SHA256 9e0258e805a63b06e06eff46bbb8fdbd25ade6a496a94eaf5dfbe5bc5c8dd896
MD5 9f3dc0e54fe0f0a803032d4247f528d5
BLAKE2b-256 b6cc4c7e1ef1d74dea2751a8529941c70714a12da10c5b3607cd15fcba16e630

See more details on using hashes here.

File details

Details for the file numalogic-0.6.1.dev4-py3-none-any.whl.

File metadata

  • Download URL: numalogic-0.6.1.dev4-py3-none-any.whl
  • Upload date:
  • Size: 123.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.12 Linux/6.2.0-1016-azure

File hashes

Hashes for numalogic-0.6.1.dev4-py3-none-any.whl
Algorithm Hash digest
SHA256 490276dc049d6d61c6c2272bb3e84e7f669b4220a85a2686ac5c191bc9751af2
MD5 51bb0164e8f97b166a273894ddacf0bd
BLAKE2b-256 93768a397cf1b898b3fb828f78a40fcecce60c4428673345a0adc38c4a5e86f0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page