Skip to main content

An application computing top songs by country or user_id over 7 days

Project description

Python package

Topper

Topper is a library made to parse and process log files of music listening. The purpose of this application is to get the top 50 songs the most listened on the last 7 days grouped by country or user id.

You must provide:

  • landing_folder: where daily files are sent
  • checkpoint_directory: used by the application to process, archive and persist data across days
  • output_directory: where result files are written everyday
  • mode (optional): country (default) or user is the aggregation mode

Input files

Each log file must match the pattern listen-YYYYMMDD.log. The file must contains data structured with:

  • One row per stream (1 listening).
  • Each row is in the following format: song_id|user_id|country

File management

Input files will be moved to the directory checkpoint/current/

Invalid files will be moved to the directory checkpoint/errors/

Data files older than 7 days will be moved to the directory checkpoint/archive/

Output files

Mode 'country'

Produced files have the following format:

country1|sng_id1:n1,sng_id2:n2,sng_id3:n3,...,sng_id50:n50
country2|sng_id1:n1,sng_id2:n2,sng_id3:n3,...,sng_id50:n50

Where country is the country ISO2 code, sng_id1:n1 the identifier of the song the most listened with the related number of streams, sng_id2:n2 the identifier of the 2nd song the most listened with the related number of streams and so on..

Mode 'user'

Produced files have the following format:

user_id1|sng_id1:n1,sng_id2:n2,sng_id3:n3,...,sng_id50:n50
user_id2|sng_id1:n1,sng_id2:n2,sng_id3:n3,...,sng_id50:n50

Where user_id is a user, sng_id1:n1 the identifier of the song the most listened with the related number of streams, sng_id2:n2 the identifier of the 2nd song the most listened with the related number of streams and so on..

Usage

Supported Python Versions

Python >= 3.6

Installation

virtualenv -p python3 venv
source venv/bin/activate
make install

Display usage

topper -h

Example

topper --landing_folder sample/ --checkpoint_directory checkpoint --output_directory output --mode country

Development

make test # coverage tests
make linter # runs pylint
make build

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for topper, version 1.0.1
Filename, size File type Python version Upload date Hashes
Filename, size topper-1.0.1-py3.8.egg (18.0 kB) File type Egg Python version 3.8 Upload date Hashes View
Filename, size topper-1.0.1.tar.gz (10.3 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page