One Stop Anomaly Shop

These details have not been verified by PyPI

Project links

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3.0

Project description

One Stop Anomaly Shop (OSAS)

This repository implements the models, methods and techniques presented in our paper: A Principled Approach to Enriching Security-related Data for Running Processes through Statistics and Natural Language Processing.

Introduction video (follows quick start guide)

This video is a recording of our Hack In The Box (HITB) Security Conference 2021 Amsterdam presentation.

]

Quick start guide

Step 1: Get/build the docker image

Option 1: Use precompiled image (might not reflect latest changes):

docker pull tiberiu44/osas:latest
docker image tag tiberiu44/osas:latest osas:latest

Option 2: Build the image locally

git clone https://github.com/adobe/OSAS.git
cd OSAS
docker build . -f docker/osas-elastic/Dockerfile -t osas:latest

Step 2: After building the docker image you can start OSAS by typing:

docker run -p 8888:8888/tcp -p 5601:5601/tcp -v <ABSOLUTE PATH TO DATA FOLDER>:/app osas

IMPORTANT NOTE: Please modify the above command by adding the absolute path to your datafolder in the appropiate location

After OSAS has started (it might take 1-2 minutes) you can use your browser to access some standard endpoints:

http://localhost:5601/app/home#/ - access to Kibana frontend (this is where you will see your data)
http://localhost:8888/?token=osas - access to Jupyter Lab (open Terminal or create a Notebook)

For Debug (in case you need to):

docker run -p 8888:8888/tcp -p 5601:5601/tcp -v <ABSOLUTE PATH TO DATA FOLDER>:/app -ti osas /bin/bash

Building the test pipeline

This guide will take you through all the necessary steps to configure, train and run your own pipeline on your own dataset.

Prerequisite: Add you own CSV dataset into your data-folder (the one provided in the docker run command)

Once you started your docker image, use the OSAS console to gain CLI access to all the tools.

In what follows, we assume that your dataset is called dataset.csv. Please update the commands as necessary in case you use a different name/location.

Be sure you are running scripts in the root folder of OSAS:

cd /osas

Step 1: Build a custom pipeline configuration file - this can be done fully manually on by bootstraping using our conf autogenerator script:

python3 osas/main/autoconfig.py --input-file=/app/dataset.csv --output-file=/app/dataset.conf

The above command will generate a custom configuration file for your dataset. It will try guess field types and optimal combinations between fields. You can edit the generated file (which should be available in the shared data-folder), using your favourite editor.

Standard templates for label generator types are:

[LG_MULTINOMIAL]
generator_type = MultinomialField
field_name = <FIELD_NAME>
absolute_threshold = 10
relative_threshold = 0.1
group_by = None # this is an optional field - it can be a single attribute name or a list of names

[LG_TEXT]
generator_type = TextField
field_name = <FIELD_NAME>
lm_mode = char
ngram_range = (3, 5)

[LG_NUMERIC]
generator_type = NumericField
field_name = <FIELD_NAME>
group_by = None # this is an optional field - it can be a single attribute name or a list of names

[LG_MUTLINOMIAL_COMBINER]
generator_type = MultinomialFieldCombiner
field_names = ['<FIELD_1>', '<FIELD_2>', ...]
absolute_threshold = 10
relative_threshold = 0.1
group_by = None # this is an optional field - it can be a single attribute name or a list of names

[LG_KEYWORD]
generator_type = KeywordBased
field_name = <FIELD_NAME>
keyword_list = ['<KEYWORD_1>', '<KEYWORD_2>', '<KEYWORD_3>', ...]

[LG_REGEX]
generator_type = KnowledgeBased
field_name = <FIELD_NAME>
rules_and_labels_tuple_list = [('<REGEX_1>','<LABEL_1>'), ('<REGEX_2>','<LABEL_2>'), ...]

You can use the above templates to add as many label generators you want. Just make sure that the header IDs are unique in the configuration file.

Step 2: Train the pipeline

python3 osas/main/train_pipeline.py --conf-file=/app/dataset.conf --input-file=/app/dataset.csv --model-file=/app/dataset.json

The above command will generate a pretrained pipeline using the previously created configuration file and the dataset

Step 3: Run the pipeline on a dataset

python3 osas/main/run_pipeline.py --conf-file=/app/dataset.conf --model-file=/app/dataset.json --input-file=/app/dataset.csv --output-file=/app/dataset-out.csv

The above command will run the pretrained pipeline on any compatible dataset. In the example we run the pipeline on the training data, but you can use previously unseen data. It will generate an output file with labels and anomaly scores and it will also import your data into Elasticsearch/Kibana. To view the result just use the the web interface.

Developing models

Now that everything is up and running, we prepared a set of development guidelines that will help you apply OSAS on your own dataset:

Pipeline configuration: This will help you understand how the label generators and anomaly scoring works in OSAS;
Rule-based score modifiers and labeling: Once you have a working OSAS pipeline, you can furhter refine your results by adding new labels and modifying the anomaly scoring based on static rules.

Citing and attribution

Full-text-paper: A Principled Approach to Enriching Security-related Data for Running Processes through Statistics and Natural Language Processing.

If you want to use this repository in any academic work, please cite the following work:

MLA

Boros, Tiberiu, et al. ‘A Principled Approach to Enriching Security-Related Data for Running Processes through Statistics and Natural Language Processing’. IoTBDS 2021 - 6th International Conference on Internet of Things, Big Data and Security, 2021.

APA

Boros, T., Cotaie, A., Vikramjeet, K., Malik, V., Park, L., & Pachis, N. (2021). A principled approach to enriching security-related data for running processes through statistics and natural language processing. IoTBDS 2021 - 6th International Conference on Internet of Things, Big Data and Security.

Chicago

Boros, Tiberiu, Andrei Cotaie, Kumar Vikramjeet, Vivek Malik, Lauren Park, and Nick Pachis. ‘A Principled Approach to Enriching Security-Related Data for Running Processes through Statistics and Natural Language Processing’. In IoTBDS 2021 - 6th International Conference on Internet of Things, Big Data and Security, 2021.

BibTeX

@article{boros2021principled,
  title={A Principled Approach to Enriching Security-related Data for Running Processes through Statistics and Natural Language Processing},
  author={Boros, Tiberiu and Cotaie, Andrei and Vikramjeet, Kumar and Malik, Vivek and Park, Lauren and Pachis, Nick},
  year={2021},
  booktitle={IoTBDS 2021 - 6th International Conference on Internet of Things, Big Data and Security}
}

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3.0

Release history Release notifications | RSS feed

This version

0.9.3

May 19, 2025

0.9.2

May 19, 2025

0.9.1

May 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osas-0.9.3.tar.gz (36.1 kB view details)

Uploaded May 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

osas-0.9.3-py3-none-any.whl (46.9 kB view details)

Uploaded May 19, 2025 Python 3

File details

Details for the file osas-0.9.3.tar.gz.

File metadata

Download URL: osas-0.9.3.tar.gz
Upload date: May 19, 2025
Size: 36.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for osas-0.9.3.tar.gz
Algorithm	Hash digest
SHA256	`7654cd79628a9f152f4188b0812acead56ccfa8b7851b5ad5dac4db05949fe39`
MD5	`93cdcea804e3b883cbbc25b87e482bde`
BLAKE2b-256	`6bef68963b68e0519debcb7eec942023769a7edd2cfe56e600e39999baf105f1`

See more details on using hashes here.

File details

Details for the file osas-0.9.3-py3-none-any.whl.

File metadata

Download URL: osas-0.9.3-py3-none-any.whl
Upload date: May 19, 2025
Size: 46.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for osas-0.9.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f815f8c3902c5de1608c8cc42c67f6a5440eedd031ac06f3eb55ead1be47f06c`
MD5	`15ae9e7022fb63438dd28ee88efadb5a`
BLAKE2b-256	`4a7e5aa0b7b352d7f0a61930dd0eb75b08cb66a82ce48abfc3a5317fbbdcd0cc`

See more details on using hashes here.

osas 0.9.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

One Stop Anomaly Shop (OSAS)

Introduction video (follows quick start guide)

Quick start guide

Building the test pipeline

Developing models

Citing and attribution

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes