Skip to main content

An application computing top songs by country or user_id over 7 days

Project description

Python package

Topper

Topper is a library made to parse and process log files of music listening. The purpose of this application is to get the top 50 songs the most listened on the last 7 days grouped by country or user id.

You must provide:

  • landing_folder: where daily files are sent
  • checkpoint_directory: used by the application to process, archive and persist data across days
  • output_directory: where result files are written everyday
  • mode (optional): country (default) or user is the aggregation mode

Input files

Each log file must match the pattern listen-YYYYMMDD.log. The file must contains data structured with:

  • One row per stream (1 listening).
  • Each row is in the following format: song_id|user_id|country

File management

Input files will be moved to the directory checkpoint/current/

Invalid files will be moved to the directory checkpoint/errors/

Data files older than 7 days will be moved to the directory checkpoint/archive/

Output files

Mode 'country'

Produced files have the following format:

country1|sng_id1:n1,sng_id2:n2,sng_id3:n3,...,sng_id50:n50
country2|sng_id1:n1,sng_id2:n2,sng_id3:n3,...,sng_id50:n50

Where country is the country ISO2 code, sng_id1:n1 the identifier of the song the most listened with the related number of streams, sng_id2:n2 the identifier of the 2nd song the most listened with the related number of streams and so on..

Mode 'user'

Produced files have the following format:

user_id1|sng_id1:n1,sng_id2:n2,sng_id3:n3,...,sng_id50:n50
user_id2|sng_id1:n1,sng_id2:n2,sng_id3:n3,...,sng_id50:n50

Where user_id is a user, sng_id1:n1 the identifier of the song the most listened with the related number of streams, sng_id2:n2 the identifier of the 2nd song the most listened with the related number of streams and so on..

Usage

Supported Python Versions

Python >= 3.6

Installation

virtualenv -p python3 venv
source venv/bin/activate
make install

Display usage

topper -h

Example

topper --landing_folder sample/ --checkpoint_directory checkpoint --output_directory output --mode country

Development

make test # coverage tests
make linter # runs pylint
make build

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topper-1.1.1.tar.gz (10.2 kB view hashes)

Uploaded Source

Built Distribution

topper-1.1.1-py3.8.egg (17.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page