An application computing top songs by country or user_id over 7 days
Project description
Topper
Topper is a library made to parse and process log files of music listening. The purpose of this application is to get the top 50 songs the most listened on the last 7 days grouped by country or user id.
You must provide:
- landing_folder: where daily files are sent
- checkpoint_directory: used by the application to process, archive and persist data across days
- output_directory: where result files are written everyday
- mode (optional):
country
(default) oruser
is the aggregation mode
Input files
Each log file must match the pattern listen-YYYYMMDD.log. The file must contains data structured with:
- One row per stream (1 listening).
- Each row is in the following format:
song_id|user_id|country
File management
Input files will be moved to the directory checkpoint/current/
Invalid files will be moved to the directory checkpoint/errors/
Data files older than 7 days will be moved to the directory checkpoint/archive/
Output files
Mode 'country'
Produced files have the following format:
country1|sng_id1:n1,sng_id2:n2,sng_id3:n3,...,sng_id50:n50
country2|sng_id1:n1,sng_id2:n2,sng_id3:n3,...,sng_id50:n50
Where country is the country ISO2 code, sng_id1:n1 the identifier of the song the most listened with the related number of streams, sng_id2:n2 the identifier of the 2nd song the most listened with the related number of streams and so on..
Mode 'user'
Produced files have the following format:
user_id1|sng_id1:n1,sng_id2:n2,sng_id3:n3,...,sng_id50:n50
user_id2|sng_id1:n1,sng_id2:n2,sng_id3:n3,...,sng_id50:n50
Where user_id is a user, sng_id1:n1 the identifier of the song the most listened with the related number of streams, sng_id2:n2 the identifier of the 2nd song the most listened with the related number of streams and so on..
Usage
Supported Python Versions
Python >= 3.6
Installation
virtualenv -p python3 venv
source venv/bin/activate
make install
Display usage
topper -h
Example
topper --landing_folder sample/ --checkpoint_directory checkpoint --output_directory output --mode country
Development
make test # coverage tests
make linter # runs pylint
make build
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file topper-1.1.1.tar.gz
.
File metadata
- Download URL: topper-1.1.1.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f70a0f1fa46b999840539921d3f5d1ad0247aef5ff700dcaf540e2f5a80778f1 |
|
MD5 | 1d78e9bd69ece7e63389c4ca3718ac9e |
|
BLAKE2b-256 | aa012e98cd8aaf28dd1b1d905eafea5621c0ae45d3e1019cd1c4bb8f6d494856 |
File details
Details for the file topper-1.1.1-py3.8.egg
.
File metadata
- Download URL: topper-1.1.1-py3.8.egg
- Upload date:
- Size: 17.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4edeb23a62538fcf97164f405b6b0979f1491aba4941df429de097d34f35c7f0 |
|
MD5 | 13d2ca0bdd69e25bf543ce5b2ff70dad |
|
BLAKE2b-256 | 330b67114ed9c017f6c9b239e8ae5a3c3853fb1d87eb21d8f353b6bfbb7c9968 |