Skip to main content

This is the server component of Argus: the alert aggregation system

Project description

Argus

build badge codecov badge Code style: black

Argus is a platform for aggregating incidents across network management systems, and sending notifications to users. Users build notification profiles that define which incidents they subscribe to.

This repository hosts the backend built with Django, while the frontend is hosted here: https://github.com/Uninett/Argus-frontend.

Installation

In development, provided you have the code repository that this README-file is in, see Project setup or Alternative setup using Docker Compose under Setup.

You can also install vith pip via PyPI. The package name is argus-server:

$ pip install argus-server

If you're going to run the frontend we also recommend you have redis running somewhere to back the websockets, in order to push realtime updates to the frontend.

If you're using the PyPI package in production, the file requirements.txt contains the exact versions of dependencies that release was tested on. You can update all the dependencies for instance with pip-compile:

$ pip install pip-tools
$ pip-compile -o your-updated-requirements.txt

.. then use $ pip install --upgrade -r your-updated-requirements.txt to upgrade all the dependencies at once.

Setup

Prerequisites

  • Python 3.7+
  • pip

Dataporten setup

  • Register a new application with the following redirect URL: {server_url}/oidc/complete/dataporten_feide/
    • {server_url} must be replaced with the URL to the server running this project, like http://localhost:8000
  • Add the following permission scopes:
    • profile
    • userid
    • userid-feide

Project setup

  • Create a Python 3.7+ virtual environment
  • pip install -r requirements.txt
  • python manage.py migrate
  • python manage.py initial_setup

Start the server with python manage.py runserver.

Alternative setup using Docker Compose

Site- and deployment-specific settings

Site-specific settings are set as per 12 factor, with environment variables. For more details, see the relevant section in the docs: Setting site-specific settings.

A recap of the environment variables that can be set by default follows.

Environment variables

  • ARGUS_DATAPORTEN_KEY, which holds the id/key for using dataporten for authentication.
  • ARGUS_DATAPORTEN_SECRET, which holds the password for using dataporten for authentication.
  • ARGUS_COOKIE_DOMAIN, the domain the cookie is set for
  • ARGUS_FRONTEND_URL, for redirecting back to frontend after logging in through Feide, and also CORS. Must either be a subdomain of or the same as ARGUS_COOKIE_DOMAIN
  • ARGUS_SEND_NOTIFICATIONS, True in production and False by default, to allow supressing notifications
  • DEBUG, 1 for True, 0 for False
  • TEMPLATE_DEBUG. By default set to the same as DEBUG.
  • DEFAULT_FROM_EMAIL, the email From-address used for notifications sent via email
  • EMAIL_HOST, smarthost (domain name) to send email through
  • EMAIL_HOST_USER, (optional) if the host in EMAIL_HOST needs authentication
  • EMAIL_HOST_PASSWORD, (optional) password if the smarthost needs that
  • EMAIL_PORT, in production by default set to 587
  • SECRET_KEY, used internally by django, should be about 50 chars of ascii noise (but avoid backspaces!)

There are also settings (not env-variables) for which notification plugins to use:

DEFAULT_SMS_MEDIA, which by default is unset, since there is no standardized way of sending SMSes. See Notifications and notification plugins.

DEFAULT_EMAIL_MEDIA, which is included and uses Django's email backend. It is better to switch out the email backend than replcaing this plugin.

A Gmail account with "Allow less secure apps" turned on, was used in the development of this project.

Production gotchas

The frontend and backend currently needs to be on either the same domain or be subdomains of the same domain (ARGUS_COOKIE_DOMAIN).

When running on localhost for dev and test, ARGUS_COOKIE_DOMAIN may be empty.

Running tests locally

Either:

  • python manage.py test

Or, if you have installed tox:

  • tox

The latter will test several django version, several python versions, and automatically compute coverage. An HTML coverage report is also autogenerated.

See tox.ini for what other things tox can do.

Mock data

Generating
PYTHONPATH=src python src/argus/incident/fixtures/generate_fixtures.py

This creates the file src/argus/incident/fixtures/incident/mock_data.json.

Loading
python manage.py loaddata incident/mock_data

Running in development

The fastest is to use virtualenv or virtaulenvwrapper or similar to create a safe place to stash all the dependencies.

  1. Create the virtualenv
  2. Fill the activated virtualenv with dependencies:
$ pip install -r requirements/prod.txt
$ pip install -r requirements/dev.txt

Copy the cmd.sh-template to a new name ending with ".sh", make it executable and set the environment variables within. This file must not be checked in to version control, since it contains passwords. You must set DATABASE_URL, DJANGO_SETTINGS_MODULE and SECRET_KEY. If you want to test the frontend you must also set all the DATAPORTEN-settings. Get the values from https://dashboard.dataporten.no/ or create a new application there.

For the database we recommend postgres as we use a postgres-specific feature in the Incident-model.

DJANGO_SETTINGS_MODULE can be set to "argus.site.settings.dev" but we recommend having a localsettings.py in the same directory as manage.py with any overrides. This file also does not belong in version control since it reflects a specific developer's preferences. Smart things first tested in a localsettings can be moved to the other settings-files later on. If you copy the entire logging-setup from "argus.site.settings.dev" to "localsettings.py" remember to set "disable_existing_loggers" to True or logentries will occur twice.

This repository uses black as a code formatter. Black will automatically install with the dev requirements.

A pre-commit hook formats new code automatically before committing. To enable this pre-commit hook, please run

pre-commit install

Debugging tips

To test/debug notifications as a whole, use the email subsystem (Media: Email in a NotificationProfile). Set EMAIL_HOST to "localhost", EMAIL_PORT to "1025", and run a dummy mailserver:

$ python3 -m smtpd -n -c DebuggingServer localhost:1025

Notifications sent will then be dumped to the console where the dummy server runs.

Endpoints

/admin/ to access the project's admin pages.

All endpoints require requests to contain a header with key Authorization and value Token {token}, where {token} is replaced by a registered auth token; these are generated per user by logging in through Feide, and can be found at /admin/authtoken/token/.

Auth endpoints
  • GET to /api/v1/auth/user/: returns the logged in user

  • GET to /api/v1/auth/users/<int:pk>/: returns a user by PK

  • POST to /oidc/api-token-auth/: returns an auth token for the posted user

    • Note that this token will expire after 14 days, and can be replaced by posting to the same endpoint.
    • Example request body: { username: <username>, password: <password> }
  • /oidc/login/dataporten_feide/: redirects to Feide login

  • /api/v1/auth/phone-number/:

    • GET: returns the phone numbers of the logged in user

      Example response body:
      [
        {
          "pk": 2,
          "user": 1,
          "phone_number": "+4767676767"
        },
        {
          "pk": 1,
          "user": 1,
          "phone_number": "+4790909090"
        }
      ]
      
    • POST: creates and returns the phone numbers of the logged in user

      Example request body:
      {
        "pk": 2,
        "phone_number": "+4767676767"
      }
      
  • /api/v1/auth/phone-number/<int:pk>/:

    • GET: returns the specific phone number of the logged in user

      Example response body:
      {
        "pk": 2,
        "user": 1,
        "phone_number": "+4767676767"
      }
      
    • PUT: updates and returns one of the logged in user's phone numbers by PK

      • Example request body: same as POST to /api/v1/auth/phone-number/
    • DELETE: deletes one of the logged in user's phone numbers by PK

    The phone number is validated with a python version of the Google library libphonenumber. It will check that the number is in a valid number series. Using a random number with enough digits that is not in a valid series will not work.

Incident endpoints
  • /api/v1/incidents/:

    • GET: returns all incidents - both open and historic

      Query parameters: All query parameters are optional. If a query parameter is not included or empty, for instance `acked=`, then the rows returned are not affected by that filter and shows rows of all kinds of that value, for instance both "acked" and "unacked" in the case of `acked=`.

      Filtering parameters:

      acked=true|false
      Fetch only acked (true) or unacked (false) incidents.
      open=true|false
      Fetch only open (true) or closed (false) incidents.
      stateful=true|false
      Fetch only stateful (true) or stateless (false) incidents.
      source__id__in=ID1[,ID2,..]
      Fetch only incidents with a source with numeric id ID1 or ID2 or..
      source__name__in=NAME1[,NAME2,..]
      Fetch only incidents with a source with name NAME1 or NAME2 or..
      source_incident_id=ID
      Fetch only incidents with source_incident_id set to ID.
      tags=key1=value1,key1=value2,key2=value
      Fetch only incidents with one or more of the tags. Tag-format is "key=value". If there are multiple tags with the same key, only one of the tags need match. If there are multiple keys, one of each key must match.

      So: /api/v1/incidents/?acked=false&open=true&stateful=true&source__id__in=1&tags=location=broomcloset,location=understairs,problem=onfire will fetch incidents that are all of "open", "unacked", "stateful", from source number 1, with "location" either "broomcloset" or "understairs", and that is on fire (problem=onfire). If the boolean parameters are not given a value or are left out, that is interpreted as not filtering at all on that parameter, showing both true and false entries.

      Paginating parameters:

      cursor=LONG RANDOM STRING|null
      Go to the page of that cursor. The cursor string for next and previous page is part of the response body./dd>
      page_size=INTEGER
      The number of rows to return. Default is 100.

      So: api/v1/incidents/?cursor=cD0yMDIwLTA5LTIzKzEzJTNBMDIlM0ExNi40NTU4MzIlMkIwMCUzQTAw&page_size=10 will go to the page indicated by "cD0yMDIwLTA5LTIzKzEzJTNBMDIlM0ExNi40NTU4MzIlMkIwMCUzQTAw" and show the next 10 rows from that point onward. Do not attempt to guess the cursor string. null means there is no more to fetch.

      Example response body:
      {
          "next": "http://localhost:8000/api/v1/incidents/?cursor=cD0yMDIwLTA5LTIzKzEzJTNBMDIlM0ExNi40NTU4MzIlMkIwMCUzQTAw&page_size=10",
          "previous": null,
          "results": [
              {
                  "pk": 10101,
                  "start_time": "2011-11-11T11:11:11+02:00",
                  "end_time": "2011-11-11T11:11:12+02:00",
                  "source": {
                      "pk": 11,
                      "name": "Uninett GW 3",
                      "type": {
                          "name": "nav"
                      },
                      "user": 12,
                      "base_url": "https://somenav.somewhere.com"
                  },
                  "source_incident_id": "12345",
                  "details_url": "https://uninett.no/api/alerts/12345/",
                  "description": "Netbox 11 <12345> down.",
                  "ticket_url": "https://tickettracker.com/tickets/987654/",
                  "tags": [
                      {
                          "added_by": 12,
                          "added_time": "2011-11-11T11:11:11.111111+02:00",
                          "tag": "object=Netbox 4"
                      },
                      {
                          "added_by": 12,
                          "added_time": "2011-11-11T11:11:11.111111+02:00",
                          "tag": "problem_type=boxDown"
                      },
                      {
                          "added_by": 200,
                          "added_time": "2020-08-10T11:26:14.550951+02:00",
                          "tag": "color=red"
                      }
                  ],
                  "stateful": true,
                  "open": false,
                  "acked": false
              }
          ]
      }
      

      Pagination-support:

      `next`
      The link to the next page, according to the cursor, or `null` if on the last page.
      `previous`
      The link to the previous page, according to the cursor, or `null` if on the first page.
      `results`
      An array of the resulting subset of rows, or an empty array if no results.

      Refer to this section for an explanation of the other fields.

    • POST: creates and returns an incident

      Example request body:
      {
          "source": 11,
          "start_time": "2011-11-11 11:11:11.11111",
          "end_time": null,
          "source_incident_id": "12345",
          "details_url": "https://uninett.no/api/alerts/12345/",
          "description": "Netbox 11 <12345> down.",
          "ticket_url": "https://tickettracker.com/tickets/987654/",
          "tags": [
              {"tag": "object=Netbox 4"},
              {"tag": "problem_type=boxDown"}
          ]
      }
      

      Refer to this section for an explanation of the fields.

  • /api/v1/incidents/<int:pk>/:

    • GET: returns an incident by PK

    • PATCH: modifies parts of an incident and returns it

      Example request body:
      {
          "ticket_url": "https://tickettracker.com/tickets/987654/",
          "tags": [
              {"tag": "object=Netbox 4"},
              {"tag": "problem_type=boxDown"}
          ]
      }
      

      The fields allowed to be modified are:

      • details_url
      • ticket_url
      • tags
  • /api/v1/incidents/<int:pk>/ticket_url/:

    • PUT: modifies just the ticket url of an incident and returns it

      Example request body:
      {
          "ticket_url": "https://tickettracker.com/tickets/987654/",
      }
      

      Only ticket_url may be modified.

  • /api/v1/incidents/<int:pk>/events/:

    • GET: returns all events related to the specified incident

      Example response body:
      [
          {
              "pk": 1,
              "incident": 10101,
              "actor": {
                  "pk": 12,
                  "username": "nav.oslo.uninett.no"
              },
              "timestamp": "2011-11-11T11:11:11+02:00",
              "received": "2011-11-11T11:12:11+02:00",
              "type": {
                  "value": "STA",
                  "display": "Incident start"
              },
              "description": ""
          },
          {
              "pk": 20,
              "incident": 10101,
              "actor": {
                  "pk": 12,
                  "username": "nav.oslo.uninett.no"
              },
              "timestamp": "2011-11-11T11:11:12+02:00",
              "received": "2011-11-11T11:11:13+02:00",
              "type": {
                  "value": "END",
                  "display": "Incident end"
              },
              "description": ""
          }
      ]
      
      Note that `received` is set by argus on reception of an event. Normally,
      this should be the same as, or a little later, than `timestamp`. If there
      is a large gap (in minutes), or `received` is earlier `timestamp`, it
      is likely something wrong with the internal clock either on the argus
      server or the event source.
      
    • POST: creates and returns an event related to the specified incident

      Example request body:
      {
          "timestamp": "2020-02-20 20:02:20.202021",
          "type": "OTH",
          "description": "The investigation is still ongoing."
      }
      

      If posted by an end user (a user with no associated source system), the timestamp field is optional, and will be set to the time the server received it if omitted.

      The valid types are:

      • STA - Incident start
        • An incident automatically creates an event of this type when the incident is created, but cannot have more than one. In other words, it's never allowed to post an event of this type.
      • END - Incident end
        • Only source systems can post an event of this type, which is the standard way of closing an indicent. An incident cannot have more than one event of this type.
      • CLO - Close
        • Only end users can post an event of this type, which manually closes the incident.
      • REO - Reopen
        • Only end users can post an event of this type, which reopens the incident if it's been closed (either manually or by a source system).
      • ACK - Acknowledge
        • Use the /api/v1/incidents/<int:pk>/acks/ endpoint.
      • OTH - Other
        • Any other type of event, which simply provides information on something that happened related to an incident, without changing its state in any way.
  • GET to /api/v1/incidents/<int:pk>/events/<int:pk>/: returns a specific event related to the specified incident

  • /api/v1/incidents/<int:pk>/acks/:

    • GET: returns all acknowledgements of the specified incident

      Example response body:
      [
          {
              "pk": 2,
              "event": {
                  "pk": 2,
                  "incident": 10101,
                  "actor": {
                      "pk": 140,
                      "username": "jp@example.org"
                  },
                  "timestamp": "2011-11-11T11:11:11.235877+02:00",
                  received": "2011-11-11T11:11:11.235897+02:00",
                  "type": {
                      "value": "ACK",
                      "display": "Acknowledge"
                  },
                  "description": "The incident is being investigated."
              },
              "expiration": "2011-11-13T12:00:00+02:00"
          },
          {
              "pk": 20,
              "event": {
                  "pk": 20,
                  "incident": 10101,
                  "actor": {
                      "pk": 130,
                      "username": "ferrari.testarossa@example.com"
                  },
                  "timestamp": "2011-11-12T11:11:11+02:00",
                  "received": "2011-11-12T11:11:11+02:00",
                  "type": {
                      "value": "ACK",
                      "display": "Acknowledge"
                  },
                  "description": "The situation is under control!"
              },
              "expiration": null
          }
      ]
      
    • POST: creates and returns an acknowledgement of the specified incident

      Example request body:
      {
          "event": {
              "timestamp": "2011-11-11 11:11:11.235877",
              "description": "The incident is being investigated."
          },
          "expiration": "2011-11-13 12:00:00"
      }
      

      Only end users can post acknowledgements.

      The timestamp field is optional, and will be set to the time the server received it if omitted.

  • GET to /api/v1/incidents/<int:pk>/acks/<int:pk>/: returns a specific acknowledgement of the specified incident

  • /api/v1/incidents/sources/:

    • GET: Returns a list of all sources
      Example response body: [ { "pk": 1, "name": "argus", "type": { "name": "argus" }, "user": 1, "base_url": "" } ]
  • GET to /api/v1/incidents/mine/: behaves like /api/v1/incidents/ except only showing the incidents added by the logged-in user, and no filtering on source or source type is possible.

  • GET to /api/v1/incidents/open/: returns all open incidents

  • GET to /api/v1/incidents/open+unacked/: returns all open incidents that have not been acked

  • GET to /api/v1/incidents/metadata/: returns relevant metadata for all incidents

Notification profile endpoints
  • /api/v1/notificationprofiles/:

    • GET: returns the logged in user's notification profiles

    • POST: creates and returns a notification profile which is then connected to the logged in user

      Example request body:
      {
          "timeslot": 1,
          "filters": [
              1,
              2
          ],
          "media": [
              "EM",
              "SM"
          ],
          "phone_number": 1,
          "active": true
      }
      

      The phone number field is optional and may also be null.

  • /api/v1/notificationprofiles/<int:pk>/:

    • GET: returns one of the logged in user's notification profiles by PK
    • PUT: updates and returns one of the logged in user's notification profiles by PK
      • Note that if timeslot is changed, the notification profile's PK will also change. This consequently means that the URL containing the previous PK will return a 404 Not Found status code.
      • Example request body: same as POST to /api/v1/notificationprofiles/
    • DELETE: deletes one of the logged in user's notification profiles by PK
  • GET to /api/v1/notificationprofiles/<int:pk>/incidents/: returns all incidents - both open and historic - filtered by one of the logged in user's notification profiles by PK

  • /api/v1/notificationprofiles/timeslots/:

    • GET: returns the logged in user's time slots

    • POST: creates and returns a time slot which is then connected to the logged in user

      Example request body:
      {
          "name": "Weekdays",
          "time_recurrences": [
              {
                  "days": [1, 2, 3, 4, 5],
                  "start": "08:00:00",
                  "end": "12:00:00"
              },
              {
                  "days": [1, 2, 3, 4, 5],
                  "start": "12:30:00",
                  "end": "16:00:00"
              }
          ]
      }
      

      The optional key "all_day" indicates that Argus should use Time.min and Time.max as "start" and "end" respectively. This also overrides any provided values for "start" and "end". An example request body:

      {
          "name": "All the time",
          "time_recurrences": [
              {
                  "days": [1, 2, 3, 4, 5, 6, 7],
                  "all_day": true
              }
          ]
      }
      

      which would yield the response:

      {
          "pk": 2,
          "name": "All the time",
          "time_recurrences": [
              {
                  "days": [1, 2, 3, 4, 5, 6, 7],
                  "start": "00:00:00",
                  "end": "23:59:59.999999",
                  "all_day": true
              }
          ]
      }
      
  • /api/v1/notificationprofiles/timeslots/<int:pk>/:

    • GET: returns one of the logged in user's time slots by PK
    • PUT: updates and returns one of the logged in user's time slots by PK
      • Example request body: same as POST to /notificationprofiles/timeslots/
    • DELETE: deletes one of the logged in user's time slots by PK
  • /api/v1/notificationprofiles/filters/:

    • GET: returns the logged in user's filters

    • POST: creates and returns a filter which is then connected to the logged in user

      Example request body:
      {
          "name": "Critical incidents",
          "filter_string": "{\"sourceSystemIds\": [<SourceSystem.pk>, ...], \"tags\": [\"key1=value1\", ...]}"
      }
      
  • /api/v1/notificationprofiles/filters/<int:pk>/:

    • GET: returns one of the logged in user's filters by PK
    • PUT: updates and returns one of the logged in user's filters by PK
      • Example request body: same as POST to /api/v1/notificationprofiles/filters/
    • DELETE: deletes one of the logged in user's filters by PK
  • POST to /api/v1/notificationprofiles/filterpreview/: returns all incidents - both open and historic - filtered by the values in the body

    Example request body:
    {
        "sourceSystemIds": [<SourceSystem.pk>, ...]
    }
    

Models

Explanation of terms

  • incident: an unplanned interruption in the source system.
  • event: something that happened related to an incident.
  • acknowledgement: an acknowledgement of an incident by a user, which hides the incident from the other open incidents.
    • If expiration is an instance of datetime, the incident will be shown again after the expiration time.
    • If expiration is null, the acknowledgement will never expire.
    • An incident is considered "acked" if it has one or more acknowledgements that have not expired.
  • start_time: the time the incident was created.
  • end_time: the time the incident was resolved or closed.
    • If null: the incident is stateless.
    • If "infinity": the incident is stateful, but has not yet been resolved or closed - i.e. open.
    • If an instance of datetime: the incident is stateful, and was resolved or closed at the given time; if it's in the future, the incident is also considered open.
  • source: the source system that the incident originated in.
  • object: the most specific object that the incident is about.
  • parent_object: an object that the object is possibly a part of.
  • problem_type: the type of problem that the incident is about.
  • tag: a key-value pair separated by an equality sign (=), in the shape of a string.
    • The key can consist of lowercase letters, numbers and underscores.
    • The value can consist of any length of any characters.

ER diagram

ER diagram

Notifications and notification plugins

A notification plugin is a class that inherits from argus.notificationprofile.media.base.NotificationMedium. It has a send(incident, user, **kwargs) static method that does the actual sending.

The included argus.notificationprofile.media.email.EmailNotification needs only incident and user, while an SMS medium in addition needs a phone_number. A phone_number is a string that includes the international calling code, see for instance Wikipedia: List of mobile telephone prefixes by country.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

argus_server-1.0.0-py3-none-any.whl (87.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page