Skip to main content

A convenience package for creating datasets of track features from self-requested Spotify data.

Project description

Spotify Rehydrator https://zenodo.org/badge/333743950.svg https://img.shields.io/badge/License-GPLv3-blue.svg https://img.shields.io/badge/Python-3.9-blue

Recreate a full dataset of audio features of songs downloaded through Spotify’s download my data facility.

This requires the files named StreamingHistory{n}.json where {n} represents the file number that starts at 0, and goes up to however many files were retrieved.

Quick start

Extended documentation is available on ReadTheDocs. First, install the package using pip. An example of using the package to rehydrate a folder of json files is then:

# main.py
from spotifyrehydrator import Rehydrator
import os
import pathlib

if __name__ == "__main__":
    Rehydrator(
        os.path.join(pathlib.Path(__file__).parent.absolute(), "input"),
        os.path.join(pathlib.Path(__file__).parent.absolute(), "output"),
        client_id=os.getenv("SPOTIFY_CLIENT_ID"),
        client_secret=os.getenv("SPOTIFY_CLIENT_SECRET"),
    ).run(return_all=True)

Run takes boolean arguments for audio_features and artist info, or for return_all which then returns both. These will determine how much information is retrieved to make up the full dataset that is saved into the output folder.

How it works

  1. The files for each person are read from the specified input folder.

  2. The name and artist provided are searched with the Spotify API. The first result is taken to be the track, and the track ID is recorded.

  3. Additional information is searched on other endpoints if audio_features, artist info or return_all were set to True.

  4. The matched track ID and audio features are saved as one tab delimited .tsv file per person into the specified output folder.

Good to know

  • Not all tracks can be retreived from the API. In our experience about 5% of tracks cannot be found on the API. These will have a value of NONE in the output files.

  • There is not a guaranteed match between the first returned item in a search and the track you want. Comparing msPlayed with the track length is a good way to test this since msPlayed should not exceed the track length.

P.S. Thanks to Pixel perfect for the title icon. 🙂

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spotifyrehydrator-0.2.0.tar.gz (24.6 kB view details)

Uploaded Source

Built Distribution

spotifyrehydrator-0.2.0-py3-none-any.whl (21.3 kB view details)

Uploaded Python 3

File details

Details for the file spotifyrehydrator-0.2.0.tar.gz.

File metadata

  • Download URL: spotifyrehydrator-0.2.0.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for spotifyrehydrator-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0fc61aa56960d10cefd25329546b6ac331180438171551f9cb925b0d7f9130c8
MD5 a6917fdbaae50fd306ea848173ea3924
BLAKE2b-256 0a152cdfe68a096b1c0b38086fdf5ae7b5a3ac8394e18456d89e874aa12edd1d

See more details on using hashes here.

File details

Details for the file spotifyrehydrator-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for spotifyrehydrator-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ce1f59fad8f59a33149f9a86bffa487d55662f45fd5951ade5cb5c21181f6836
MD5 a0d32c8aa11b150cf08805cccbd89a15
BLAKE2b-256 6497812d272d9d5f58b7e3f84e6935bcf4aadc5be626c124354132cc50ddcd6a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page