Skip to main content

Collector Package for Insights for AAP

Project description

Insights Analytics Collector

This package helps with collecting data by user-defined collector methods. It packs collected data to one or more tarballs and sends them to user-defined URL.

Some data and classes has to be implemented. By function:

  • persisting settings
  • data like credentials, content type etc. for shipping (POST request)

By Classes:

  • Collector
  • Package
  • collector_module:
    • functions with @register decorator, one with config=True, format='json'
    • slicing functions (optional) for splitting large data (db tables) by time intervals

Collector

Entrypoint with "gather()" method.

Implementation

Collector is an Abstract class, implement abstract methods.

  • _package_class: Returns class of your implementation of Package
  • _is_valid_license: Check for valid license specific to your service
  • _is_shipping_configured: Check if shipping to cloud is configured
  • _last_gathering: returns datetime. Loading last successful run from some persistent storage
  • _save_last_gather: Persisting last successful run
  • _load_last_gathered_entries: Has to fill dictionary self.last_gathered_entries. Load from persistent storage Dict contains keys equal to collector's registered functions' keys (with @register decorator)
  • _save_last_gathered_entries: Persisting self.last_gathered_entries

An example can be found in Test collector

Package

One package represents one .tar.gz file which will be uploaded to Analytics. Registered collectors are placed to collections as JSON/CSV files this way:

  • Upload limit is 100MB. The maximum bytes of uncompressed data is MAX_DATA_SIZE (by guess 200MB, redefine if needed)
  • JSON collectors are processed first, it's not expected they'll exceed this size
    • if yes, use CSV format instead
  • CSV files can be collected in two modes:
    • with slicing function
      • splitting data by custom function - usually time interval
      • the purpose is to have reasonable SQL query in big databases
      • @register(fnc_slicing=...)
    • without slicing function
  • CSV files are expected to be large (db data), so they can be split by CsvFileSplitter in the collector function.

How are files included into packages:

  • JSON files are in first package
  • CSVs without slicing are included to first free package with enough size (can be added to JSON files)
    • if function collects i.e. 900MB, it's sent in first 5 packages
    • two functions cannot have the same name in @register() decorator
  • CSVs with slicing are sent after each slice is collected (with respect to smaller volume size if running in OpenShift/docker)
    • each slice can be also split by CsvFileSplitter, if bigger than MAX_DATA_SIZE
      • then each part of slice is sent immediately
    • two functions can have the same name in @register() decorator

Number of packages (tarballs) is bigger of:

  • number of files collected by one biggest registered CSV collector without slicing
  • number of files collected by all registered CSV collectors with slicing
  • can be +1 for JSON files

See the test_gathering.py for details

Implementation

Package is also abstract class. You have to implement basically info for POST request to cloud.

  • PAYLOAD_CONTENT_TYPE: contains registered content type for cloud's ingress service
  • MAX_DATA_SIZE: maximum size in bytes of uncompressed data for one tarball. Ingress limits uploads to 100MB. Defaults to 200MB.
  • get_ingress_url: Cloud's ingress service URL
  • _get_rh_user: User for POST request
  • _get_rh_password: Password for POST request
  • _get_x_rh_identity: X-RH Identity Used for local testing instead of user and password
  • _get_http_request_headers: Dict with any custom headers for POST request

An example can be found in Test package

Collector module

Module with gathering functions is the main part you need to implement. It should contain functions returning data either in dict format or list of CSV files.

Function is registered by @register decorator:

from insights_analytics_collector import register

@register('json_data', '1.0', format='json', description="Data description")
def json_data(**kwargs):
    return {'my_data': 'True'}

Decorator @register has following attributes:

  • key: (string) name of output file (usually the same as function name)
  • version: (string) i.e. '1.0'. Version of data - added to the manifest.json for parsing on cloud's side
  • description: (string) not used yet
  • format: (string) Default: 'json' extension of output file, can be "json" of "csv". Also determines function output.
  • config: (bool) Default: False. there has to be one function with config=True, format=json
  • fnc_slicing: Intended for large data. Described in Slicing function below
  • shipping_group: (string) Default: 'default'. Splits data to packages by group, if required.
from <your-namespace> import Collector  # your implementation

collector = Collector
collector.gather()

Slicing function

Collectors

Registered collectors

Abstract classes

Tarballs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insights-analytics-collector-0.3.0.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file insights-analytics-collector-0.3.0.tar.gz.

File metadata

File hashes

Hashes for insights-analytics-collector-0.3.0.tar.gz
Algorithm Hash digest
SHA256 234086d2aaf5120e82454a9aa5e073d8d0ead139b6bb7787a91bad6843b93456
MD5 da1e3f5f3096c16620b56b86de1d6d74
BLAKE2b-256 73fb541938e22390ca6d6604bdaa40f53c7b4f6a9d023b161386caae86dfb8d6

See more details on using hashes here.

File details

Details for the file insights_analytics_collector-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for insights_analytics_collector-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 69a6eb40f03dd00e46bd0e31e09239e8ce529d5fe41df66f2828fe2f239c8f3f
MD5 caf485f308b18ad10f0153f8ff7a6259
BLAKE2b-256 dab651bfabf306839808d26addf10d20e00304de0f931b6af597c61fe3bf0267

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page