Python package which helps to identify important metric changes and quickly find clusters in data which changed the trend of the metric or caused the anomaly

These details have not been verified by PyPI

Project links

Project description

Introduction to Anomeda

anomeda package helps you analyze non-aggregated time-series data with Python and quickly indentify important changes of your metric.

Here is a brief example of how anomeda can work for you.

"Why has the number of our website visits decreased a week ago? What kind of users caused that?" - anomeda will answer such questions quickly by processing non-aggregated visits of your website.

It will show you, for instance, that users from the X country using the Y device suddenly stopped visiting your website. Not only that, even if you are not aware of any significant change of the number of visits, anomeda will highlight the cluster of events where it happened.

Is it fraudulent activity, a paused marketing campaign or technical issues? It's up to you to investigate.

The package is easy-to-use and adjustable enough to meet a wide range of real scenarios. The basic object, anomeda.DataFrame, inherits pandas.DataFrame, so you will find the API familiar. In addition, there are different options for fine-tuning alghorithms used under the hood.

Some of what anomeda can do for your non-aggregated data:

Highlight time points and clusters when the trend, mean or variance changed
Fit trends for any cluster considering the points where trends change
Highlight time points and clusters if the anomalies were observed, considering trend at that moment
Compare time periods and find clusters changing the metric

Find the project in its GitHub repo.

Explore the Documentation of anomeda.

Quick start

Let's imagine you oversee the number of visits of a website.

You have a table with visits. Typically you just aggregate them by a datetime column and monitor from 3 to 5 dashboards with overall number of visits, as well as visits of important pages, visits from specific systems, visits of specific users clustes, etc. Here is what you would do with anomeda.

Let's define an anomeda object.

import anomeda

anomeda_df = anomeda.DataFrame(
    df, # pandas.DataFrame
    measures_names=['country', 'system', 'url', 'duration'], # columns represending measures or characteristics of your events
    measures_types={
        'categorical': [;'country', 'system', 'url'], 
        'continuous': ['duration'] # measures can also be continuous -  anomeda will take care of clustering them properly 
    },
    index_name='date',
    metric_name='visit', # dummy metric, always 1
    agg_func='sum' # function that is used to aggregate metric
)

anomeda.DataFrame inherits pandas.DataFrame, so you can treat them similarly.

NOTE

Some pandas methods are not yet adapted for anomeda. They return a new pandas.DataFrame instead of a anomeda.DataFrame. You just need to initialize an anomeda object with a returned object in that case.

Let's try to extract trends for important clusters from the data.

trends = anomeda.fit_trends(
    anomeda_df,
    trend_fitting_conf={'max_trends': 'auto', 'min_var_reduction': 0.75}, # set the number of trends automatically,
                                                                          # try to reduce error variance compared to error of estimating values by 1-line trend by 75%
    breakdown='all-clusters', # fit trends for clusters extracted from all possible sets of measures
    mettic_propagte='zeros', # if some index values are missed after aggregation for a cluster, fill them with zeros
    min_cluster_size=3 # skip small clusters, they all will be combined into 'skipped' cluster
)

Typically you will see something like this:

anomeda.fit_trends method

You can then plot the trends using the plot_trends method. You can choose a specific cluster or plot them all together.

anomeda.plot_trends(anomeda_df, clusters=['`country`=="Germany"'])

The output will look like this:

anomeda.plot_trends method

Of course, you may have no idea which cluster caused the problem and what to plot. Almost always you know only that there is a decrease of an overall metric and you need to find the culprits. Let's utilize another method -- anomeda.compare_clusters.

anomeda.compare_clusters(
    anomeda_df,
    period1='date < "2024-01-30"',
    period2='date >= "2024-01-30"'
)

You see the clusters you fitted before and comparison between their characteristics. The result is quite hefty, but you can easily add your own metrics and sort clusters so that the cluster you are looking for will be on top. For example, look at how different means in the second cluster are. The second cluster corresponds to Germany (the first cluster consists of all events, so we are not interested in it now).

anomeda.compare_clusters method

Finally, you can check if there are any point anomalies present in any of your clusters.

anomeda.find_anomalies(
    anomeda_df, 
    anomalies_conf: {'p_large': 1, 'p_low': 1, 'n_neighbors': 3}
)

The output will look like this:

anomeda.find_anomalies method

If you plot the metric with its clusters, it would look quite reasonable.

anomeda.find_anomalies method

There are some nuances of how to use anomeda wisely and powerfully. For example, you may use same anomeda methods simply with numpy arrays, without creating DataFrame's! See full Documentation for more details and hints.

Installing

anomeda is availale from PyPI. You may run a pip install command:

pip install anomeda

Also, the GitHub repo contains the source and built distribution files in dist folder.

You must have such packages be installed:

pandas
numpy
sklearn
scipy
matplotlib

Contribution

You are very welcome to participate in developing to the project. You may solve the current issues or add new functionality - it is up for you to decide.

Here is how your flow may look like:

Preparing your Fork
- Click ‘Fork’ on Github, creating e.g. yourname/theproject.
- Clone your project: git clone git@github.com:yourname/theproject.
- cd theproject
- Create and activate a virtual environment.
- Install the development requirements: pip install -r dev-requirements.txt.
- Create a branch: git checkout -b my_branch
Making your Changes
- Make the changes
- Write tests checking your code works for different scenarious
- Run tests, make sure they pass.
- Commit your changes: git commit -m "Foo the bars"
Creating Pull Requests
- Push your commit to get it back up to your fork: git push origin HEAD
- Visit Github, click handy “Pull request” button that it will make upon noticing your new branch.
- In the description field, write down issue number (if submitting code fixing an existing issue) or describe the issue + your fix (if submitting a wholly new bugfix).
- Hit ‘submit’!

Reporting issues

To report an issue, you should use Issues section of the project's page on Github. We will try to solve the issue as soon as possible.

Contacts

If you have any questions related to anomeda project, feel free reaching out to the author.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.4

May 12, 2024

0.1.3

Apr 15, 2024

0.1.2

Apr 2, 2024

This version

0.1.1

Feb 26, 2024

0.1.0

Feb 25, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anomeda-0.1.1.tar.gz (25.9 kB view details)

Uploaded Feb 26, 2024 Source

Built Distribution

anomeda-0.1.1-py3-none-any.whl (22.5 kB view details)

Uploaded Feb 26, 2024 Python 3

File details

Details for the file anomeda-0.1.1.tar.gz.

File metadata

Download URL: anomeda-0.1.1.tar.gz
Upload date: Feb 26, 2024
Size: 25.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.12

File hashes

Hashes for anomeda-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`ce4689e6f83912d3fcf0426e505c9ba0f76603484c99332ab2220ba2e60a57ed`
MD5	`c006436e110cd8b5178090e95ab282cf`
BLAKE2b-256	`3cbf1c75b4fe5444912604fbe8c6f6728429af07265f4f0c6437873ac858ba57`

See more details on using hashes here.

File details

Details for the file anomeda-0.1.1-py3-none-any.whl.

File metadata

Download URL: anomeda-0.1.1-py3-none-any.whl
Upload date: Feb 26, 2024
Size: 22.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.12

File hashes

Hashes for anomeda-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b0f36a35c5b92c2f94a544b887303a349f5d75dc3af698d631fda8fb073fe63`
MD5	`f08ed87046f8ad0278354c53bff20743`
BLAKE2b-256	`d365c01ad436e23e7b5450ecf4b71333d58fae3ad98c60fe2b29827039d8051b`