Skip to main content

Python package which helps to identify important metric changes and quickly find clusters in data which changed the trend of the metric or caused the anomaly

Project description

Introduction to Anomeda

anomeda package helps you analyze non-aggregated time-series data with Python.

Here is a brief example of how anomeda can work for you.

"Why has the number of our website visits decreased a week ago? What kind of users caused that?" - anomeda will answer such questions quickly by processing non-aggregated visits of your website.

It will show you, for instance, that users from the X country using the Y device suddenly stopped visiting your website. Not only that, even if you are not aware of any significant change of the number of visits, anomeda will highlight the cluster of events where it happened.

Is it fraudulent activity, a paused marketing campaign or technical issues? It's up to you to investigate.

The package is easy-to-use and adjustable enough to meet a wide range of real scenarios. The basic object, anomeda.DataFrame, inherits pandas.DataFrame, so you will find the API familiar. In addition, there are different options for fine-tuning alghorithms used under the hood.

Some of what anomeda can do for your non-aggregated data:

  • Highlight time points and clusters when the trend, mean or variance changed
  • Fit trends for any cluster considering the points where trends change
  • Highlight time points and clusters if the anomalies were observed, considering trend at that moment
  • Compare time periods and find clusters changing the metric

Find the project in its GitHub repo.

Explore the Documentation of anomeda.

Quick start

Let's imagine you oversee the number of visits of a website.

You have a table with visits. Typically you just aggregate them by a datetime column and monitor from 3 to 5 dashboards with overall number of visits, as well as visits of important pages, visits from specific systems, visits of specific users clustes, etc. Here is what you would do with anomeda.

Let's define an anomeda object.

import anomeda

anomeda_df = anomeda.DataFrame(
    df, # pandas.DataFrame
    measures_names=['country', 'system', 'url', 'duration'], # columns represending measures or characteristics of your events
    measures_types={
        'categorical': [;'country', 'system', 'url'], 
        'continuous': ['duration'] # measures can also be continuous -  anomeda will take care of clustering them properly 
    },
    index_name='date',
    metric_name='visit', # dummy metric, always 1
    agg_func='sum' # function that is used to aggregate metric
)

anomeda.DataFrame inherits pandas.DataFrame, so you can treat them similarly.


NOTE

Some pandas methods are not yet adapted for anomeda. They return a new pandas.DataFrame instead of a anomeda.DataFrame. You just need to initialize an anomeda object with a returned object in that case.


Let's try to extract trends for important clusters from the data.

trends = anomeda.fit_trends(
    anomeda_df,
    trend_fitting_conf={'max_trends': 'auto', 'min_var_reduction': 0.75}, # set the number of trends automatically,
                                                                          # try to reduce error variance compared to error of estimating values by 1-line trend by 75%
    breakdown='all-clusters', # fit trends for clusters extracted from all possible sets of measures
    min_cluster_size=3 # skip small clusters
)

Typically you will see something like this:

anomeda.fit_trends method

You can then plot the trends using the plot_trends method. You can choose a specific cluster or plot them all together.

anomeda.plot_trends(anomeda_df, clusters=['`country`=="Germany"'])

The output will look like this:

anomeda.plot_trends method

Of course, you may have no idea which cluster caused the problem and what to plot. Almost always you know only that there is a decrease of an overall metric and you need to find the culprits. Let's utilize another method -- anomeda.compare_clusters.

anomeda.compare_clusters(
    anomeda_df,
    period1='date < 30',
    period2='date >= 30'
)

You see the clusters you fitted before and comparison between their characteristics. The result is quite hefty, but you can easily add your own metrics and sort clusters so that the cluster you are looking for will be on top. For example, look at how different means in the second cluster are. The second cluster corresponds to Germany (the first cluster consists of all events, so we are not interested in it now).

anomeda.compare_clusters method

Finally, you can check if there are any point anomalies present in any of your clusters.

anomeda.find_anomalies(
    anomeda_df, 
    anomalies_conf: {'p_large': 1, 'p_low': 1, 'n_neighbors': 3}
)

The output will look like this:

anomeda.find_anomalies method

If you plot the metric with its clusters, it would look quite reasonable.

anomeda.find_anomalies method

There are some nuances of how to use anomeda wisely and powerfully. For example, you may use same anomeda methods simply with numpy arrays, without creating DataFrame's! See full Documentation for more details and hints.

Installing

The GitHub repo contains the source and built distribution files in dist folder.

You must have such packages be installed:

  • pandas
  • numpy
  • sklearn
  • scipy
  • matplotlib

Contribution

You are very welcome to contribute to the project. The contribution guide is coming soon.

Contacts

If you have any questions related to anomeda project, feel free reaching out to the author.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anomeda-0.1.0.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

anomeda-0.1.0-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file anomeda-0.1.0.tar.gz.

File metadata

  • Download URL: anomeda-0.1.0.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.0

File hashes

Hashes for anomeda-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7f6be780e2902666cf67179532cb7b84341aa6bf0c4e03ea0382b488bfb0572c
MD5 c6a6a4a24345eb0a2821b0082a51b46f
BLAKE2b-256 23fc36c48ed7790a88a71f921d4c2d8debb5526525402667ffd8535398edc29b

See more details on using hashes here.

File details

Details for the file anomeda-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: anomeda-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.0

File hashes

Hashes for anomeda-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3de69c50a1febe651caf3e97fa029968717b194e27b4c61a92d751d26dc21a32
MD5 a41aafb99fd3990a11d622afa77bc79d
BLAKE2b-256 8c19e20b0ef9b968a0aad692ac7b143eb04efae833d8e32c80147d67273a5b45

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page