Skip to main content

Machine Learning module for Discogs

Project description

DiscogsLearner - ML library for Discogs

[comment]: # Version: 0.2

Introduction

This package enables predicting similar releases using your Discogs Wantlist and/or Collection. To accomplish this, a 2-step process is executed: Data retrieval using the monthly data dumps and data learning using a list of identifiers obtained from your Wantlist and/or Collection. It produces release identifiers together with probabilities of similarity to your input. See Details for an in-depth explanation. This package requires about 3GB of free RAM to process the whole 'Electronic' genre.

Installation

pip install discogslearner

Usage

  1. Obtain a Discogs personal access token. See https://www.discogs.com/settings/developers on how to obtain one.
  2. Execute a script like the following:
import discogslearner

if __name__ == "__main__":
    output_file = "Data/discogs_db.tsv"
    my_genre = "Electronic"
    my_token = "your_token_here"
    
    extracter = discogslearner.Extracter(genre = my_genre)
    extracter.extract(output = output_file)
    learner = discogslearner.Learner(db_path = output_file, 
                                    use_wantlist=True, 
                                    use_collection=True,
                                    token = my_token)

    outcome = learner.learn_and_predict(n_models = 10)
    print(outcome)

Details

In order to learn from Discogs data, the fields Format, Year, Country, Style(s) and Number of Tracks are considered factors of a Release. Fields with categorical values (Format, Country & Styles) are formatted using One-Hot encoding, using only Releases from the given Wantlist and/or Collection. Next, a PCA transformation is applied on these Releases, before applying the transformation on all extracted Releases from Discogs. Note that during this process, only the Styles within the Wantlist and/or Collection are kept in the database as Releases with other styles are most likely not interesting.

Artists, Labels, and Companies are considered to be groups of Releases, so to incorporate these, the mean and variance of the grouped PCA data is taken and attached to the original PCA data. In the current version, collaborating groups (e.g. two Artists together) are seen as a single entity, but this will be updated in future versions.

The Wantlist and/or Collection are seen as positive predictors, but negative predictors are usually not saved. Therefore, a random set of Releases of equal size as the positive predictors is taken as negative predictors. This introduces bias and thus, this package combines 10 models with 10 different negative predictors and multiplies the probabilities to obtain a single score for each Release. Note that Releases part of the Wantlist and/or Collection are not returned in the predictions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

discogslearner-0.2.tar.gz (12.6 kB view details)

Uploaded Source

File details

Details for the file discogslearner-0.2.tar.gz.

File metadata

  • Download URL: discogslearner-0.2.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for discogslearner-0.2.tar.gz
Algorithm Hash digest
SHA256 926845c072e17b8cd3455d50f2ec1dd4b2586cf784217e1ce6955bbc94509969
MD5 95923e9b8238bde18736d933e0af0120
BLAKE2b-256 c8ec457b3642f0e07b10eb3e9449558ed28dd76aa0f3f6f55d323e0dff9a4a4d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page