Machine Learning module for Discogs
Project description
DiscogsLearner - ML library for Discogs
[comment]: # Version: 0.2
Introduction
This package enables predicting similar releases using your Discogs Wantlist and/or Collection. To accomplish this, a 2-step process is executed: Data retrieval using the monthly data dumps and data learning using a list of identifiers obtained from your Wantlist and/or Collection. It produces release identifiers together with probabilities of similarity to your input. See Details for an in-depth explanation. This package requires about 3GB of free RAM to process the whole 'Electronic' genre.
Installation
pip install discogslearner
Usage
- Obtain a Discogs personal access token. See https://www.discogs.com/settings/developers on how to obtain one.
- Execute a script like the following:
import discogslearner
if __name__ == "__main__":
output_file = "Data/discogs_db.tsv"
my_genre = "Electronic"
my_token = "your_token_here"
extracter = discogslearner.Extracter(genre = my_genre)
extracter.extract(output = output_file)
learner = discogslearner.Learner(db_path = output_file,
use_wantlist=True,
use_collection=True,
token = my_token)
outcome = learner.learn_and_predict(n_models = 10)
print(outcome)
Details
In order to learn from Discogs data, the fields Format, Year, Country, Style(s) and Number of Tracks are considered factors of a Release. Fields with categorical values (Format, Country & Styles) are formatted using One-Hot encoding, using only Releases from the given Wantlist and/or Collection. Next, a PCA transformation is applied on these Releases, before applying the transformation on all extracted Releases from Discogs. Note that during this process, only the Styles within the Wantlist and/or Collection are kept in the database as Releases with other styles are most likely not interesting.
Artists, Labels, and Companies are considered to be groups of Releases, so to incorporate these, the mean and variance of the grouped PCA data is taken and attached to the original PCA data. In the current version, collaborating groups (e.g. two Artists together) are seen as a single entity, but this will be updated in future versions.
The Wantlist and/or Collection are seen as positive predictors, but negative predictors are usually not saved. Therefore, a random set of Releases of equal size as the positive predictors is taken as negative predictors. This introduces bias and thus, this package combines 10 models with 10 different negative predictors and multiplies the probabilities to obtain a single score for each Release. Note that Releases part of the Wantlist and/or Collection are not returned in the predictions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file discogslearner-0.2.tar.gz
.
File metadata
- Download URL: discogslearner-0.2.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 926845c072e17b8cd3455d50f2ec1dd4b2586cf784217e1ce6955bbc94509969 |
|
MD5 | 95923e9b8238bde18736d933e0af0120 |
|
BLAKE2b-256 | c8ec457b3642f0e07b10eb3e9449558ed28dd76aa0f3f6f55d323e0dff9a4a4d |