Skip to main content

a simple preview for dsp digital advertising information

Project description

Tool for merging DSP data from many providers in a single view. There are CLI tools for launching the workers responsible for parsing .csv files and storing them in a MySQL database. There is also a web app where it is possible to have a complete report of the operation.

Configuration

You must specify the following environment variables prior to usage:

  • DB_HOST the url or ip for the mysql server

  • DB_PORT the port for the mysql server

  • DB_NAME the name of the database

  • DB_USER a user with writing permission

  • DB_PASS the user’s password

If you inted to develop or change something, it is also need:

  • DB_TEST_NAME the name of the database (for testing purposes)

  • DB_TEST_USER a user with writing permission (for testing purposes)

  • DB_TEST_PASS the user’s password (for testing purposes)

Since this project has support only for GCP (currently), the following environment variables are also mandatory:

  • GOOGLE_APPLICATION_CREDENTIALS the json file for an account with admin permissions for the Storage service.

  • GCP_BUCKET the bucket where the .csv file will be placed

It is possible to set the workers to consume a RabbitMQ queue, so it is necessary to specify the server info:

  • MQ_HOST the RabbitMQ server ip address

  • MQ_PORT the RabbitMQ port

  • MQ_VHOST the RabbitMQ virtual host

  • MQ_USER the RabbitMQ user

  • MQ_PASS the RabbitMQ password

  • MQ_QUEUE the RabbitMQ queue name (we might change for workers operating through exchanges later, so we can have multiple workers operating at the same time, but for now lets use one queue).

A much better option would be to set all these variables in a file named .dspreview.csg in the users home folder:

{
    "GOOGLE_APPLICATION_CREDENTIALS": "/home/user/service_account.json",
    "GCP_BUCKET": "...",
    "DB_HOST": "...",
    "DB_PORT": "3306",
    "DB_NAME": "...",
    "DB_USER": "...",
    "DB_PASS": "..."
}

If the above environment variables are set, you can initialize the system. It will create the database, tables, and so on. It might be donne through:

$ dspreview init

There are currently two workers: dcm and dsp. The dcm worker expects to find a file named dcm.csv inside the GCP_BUCKET, with the following structure:

[date, campaign_id, campaign, placement_id, placement, impressions, clicks, reach]

where:

  • date should be in format YYYY-MM-DD

  • campaign_id is an integer

  • campaign is a string with the information brand_subbrand

  • placement_id is an integer

  • placement is a string with the information dsp_adtype

  • impressions is an integer

  • clicks is an integer

  • reach is an float, please take care to not repeat this, since it is a calculated metric

While the dsp worker expect to find a file with the dsp’s name (like dbm.csv or mediamath.csv) and the following structure:

[date, campaign_id, campaign, impressions, clicks, cost]

where:

  • date should be in format YYYY-MM-DD

  • campaign_id is an integer

  • campaign is a string with the information brand_subbrand_adtype

  • impressions is an integer

  • clicks is an integer

  • cost is a float

In order to launch a worker, you might use the command:

$ dspreview --worker dcm

or:

$ dspreview --worker dsp

If the DSP is known in advance, you might run:

$ dspreview --worker dsp --dsp dbm

or

$ dspreview --worker dsp --dsp mediamath

When all files are stored in the MySQL database, the following command generates the full report:

$ dspreview --generate-report

The web app might be run through:

$ dspreview serve --port 80

The default port is 80

Finally, it is possible to put the worker to run in a loop, in this case, it will consume a queue in the RabbitMQ. The messages must be:

  • dcm for the DCM worker

  • dsp for running all DSP workers

  • dsp.dbm for running a specific DSP worker (DBM in this case)

  • report for generating full report

The worker might be launched as:

$ dspreview operate

It is possible to add itens to the queue through:

$ dspreview --poke "dsp.dbm"

Classifications

Besides the configuration described above, it is also important to understand the classifications. The classifications might be managed through the webserver in the underlying section.

The idea is that each line in the DCM and DSP files might be classified according to a brand, a sub brand, and a dsp. You can create a regex that will be checked against a combination of fields.

For instance, it is a line in the DBM file:

date

campaign_id

campaign

impressions

clicks

cost

2018-01-01

128115

acme_asprin_youtube

6011070

11889

40334.2797

You might choose to apply the regex .*acme.* for classifying the brand as Acme, and the only necessary field is campaign.

Chances are that you have another brand, say Umbrella Corp, and for some reason, you end up with a line like:

date

campaign_id

campaign

impressions

clicks

cost

2018-03-01

475987

acme_solution

4867867

46454

87897.4558

Now, the regex above would classify both this lines as Acme.

A solution would be change the regex to ^128115acme.* applied to the concatenation of fields campaign_id and campaign, in order to identify ads belonging to the brand Acme.

For identifying ads belonging to Umbrella Corp the regex could be ^475987acme.* applied to the concatenation of fields campaign_id and campaign.

The fields that can be used are:

  1. campaign_id

  2. campaign

  3. placement_id (form DCM files only)

  4. placement (form DCM files only)

Fields will be concatenated in this order.

The regex patterns will be applied in the order in which they are registered in the database. The first matching a combination of fields will define the classification, so it is necessary to avoid ubiquitous regex.

Preparing for Development

  1. Ensure pip and pipenv are installed.

  2. Make sure you also have default-libmysqlclient-dev or libmysqlclient-dev installed.

  3. Clone repository: https://github.com/thiagolcmelo/dspreview

  4. Fetch development dependencies: make install

Running Tests

Run tests locally using make if virtualenv is active:

$ make

If virtualenv isn’t active then use

$ pipenv run make

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dspreview-0.1.9.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

dspreview-0.1.9-py2.py3-none-any.whl (1.1 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file dspreview-0.1.9.tar.gz.

File metadata

  • Download URL: dspreview-0.1.9.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.3

File hashes

Hashes for dspreview-0.1.9.tar.gz
Algorithm Hash digest
SHA256 0bbb3582ea4e4074f8432964da39ca7b8737c1b0ba7af9e004d7cab8723b8281
MD5 e92e46884fdcbc3fa5a1cfe9f6fc1a51
BLAKE2b-256 b503b2ec7951e520a5dc5dd6e52513ac18aabe40453b94d62a5b3170cc300bad

See more details on using hashes here.

File details

Details for the file dspreview-0.1.9-py2.py3-none-any.whl.

File metadata

  • Download URL: dspreview-0.1.9-py2.py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.3

File hashes

Hashes for dspreview-0.1.9-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 65e0a7653ad80cbf736d109ce85b7a28f779fb1d88dda857b92b34c7654c5ecf
MD5 c457b08bc600c4b8253f1b8603410668
BLAKE2b-256 e365749eea02cff661aaff51b1bc35268ab3d51652db4641d0ce8cc37035409e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page