Skip to main content

Extract public Spotify data — tracks, albums, artists, playlists, podcasts, and lyrics — without the official API. Sync + async, typed, one dependency.

Project description

SpotifyScraper

Live demo PyPI version Python versions CI Docs License: MIT

Extract public Spotify data — tracks, albums, artists, playlists, and podcasts — without the official API or an API key.

🎧 Try it live in your browser → — paste any Spotify link and watch SpotifyScraper pull typed data, cover art, and a preview, with the exact Python that does it. (How it was built.)

SpotifyScraper bootstraps an anonymous token from Spotify's own public embed pages and reads the same JSON endpoints the web player uses, returning typed, immutable models. v3 is a ground-up rewrite focused on reliability and a clean, modern API. Public data needs no login; the opt-in logged-in features (lyrics, podcast transcripts, and account info) add your own Spotify sp_dc cookie — never a password or an API key.

Upgrading from v2? See the migration guide. The previous line lives on the v2.x branch.

Install

pip install spotifyscraper                 # core (only depends on httpx)
pip install "spotifyscraper[media]"        # + cover/preview embedding (mutagen)
pip install "spotifyscraper[browser]"      # + Playwright browser fallback & login
pip install "spotifyscraper[cli]"          # + the spotifyscraper command-line tool
pip install "spotifyscraper[keyring]"      # + store the login cookie in the OS keyring
pip install "spotifyscraper[all]"          # everything

Python 3.10+.

Quickstart

from spotify_scraper import SpotifyClient

with SpotifyClient() as client:
    track = client.get_track("https://open.spotify.com/track/4uLU6hMCjMI75M1A2tKUQC")
    print(track.name, "—", track.artists[0].name)
    print(track.duration_ms, "ms |", track.preview_url)

    print(track.to_dict())          # JSON-safe dict, if you prefer dicts

Every entity has its own method — get_track, get_album, get_artist, get_playlist, get_episode, get_show — each accepting a URL, URI, or bare ID.

Async

import asyncio
from spotify_scraper import AsyncSpotifyClient

async def main():
    async with AsyncSpotifyClient() as client:
        track, album = await asyncio.gather(
            client.get_track("4uLU6hMCjMI75M1A2tKUQC"),
            client.get_album("4aawyAB9vmqN3uQ7FjRGTy"),
        )
        print(track.name, "|", album.name)

asyncio.run(main())

Download a cover and preview

from spotify_scraper import SpotifyClient

with SpotifyClient() as client:
    track = client.get_track("4uLU6hMCjMI75M1A2tKUQC")
    client.download_cover(track, dest="covers/")
    client.download_preview(track, dest="previews/", embed_cover=True)  # needs [media]

Localized display names

Pass locale — a BCP-47 language tag: a bare language subtag ("de", "ja") or a language-region tag ("ja-JP") — to localize the language of display names. Set it per client or override it per call:

with SpotifyClient(locale="ja-JP") as client:        # default for every call
    track = client.get_track("4uLU6hMCjMI75M1A2tKUQC")
    other = client.get_track("4uLU6hMCjMI75M1A2tKUQC", locale="de-DE")  # per-call wins

It is sent as the Accept-Language header and changes only how names are spelled. It is not a country/market code — a bare "US" is meaningless as a language and is ignored — and it does not filter regional availability or vary preview URLs: anonymous Spotify resolves country from the request IP, and its pathfinder silently ignores a market variable. True market/availability filtering requires the authenticated Web API, which this library does not implement; for region-specific results, point the client's proxy at the target region. See the localization guide.

Features

  • All core entities + podcasts — tracks, albums, artists, playlists, shows, episodes.
  • Search across every entity type, returning one typed SearchResults.
  • Localized display names — pass a BCP-47 language tag (locale) to set the language of names.
  • Lyrics & podcast transcripts — cookie-authenticated, time-synced, one token for both.
  • Browser-assisted login + session persistence — log in once, then run headless (no stored passwords).
  • Account-awareget_account() / is_premium(), plus cookie-free session_info().
  • Batch helpers — plural get_*s([...]) with partial-failure-safe results and managed concurrency.
  • Sync & async clients sharing one sans-io core.
  • Typed, frozen models with JSON-safe to_dict() / from_dict().
  • Two-tier resilience — Spotify's GraphQL API with automatic fallback to the embed page.
  • One core dependency (httpx); media and browser support are optional extras.
  • Optional response cache — opt-in, persistent, token-safe (only token-free pathfinder GETs).
  • Anti-ban built in — per-host rate limiting, retries with backoff, UA rotation, proxies.
  • Browser fallback via Playwright when you need a real browser.

Command line

With the cli extra installed, a spotifyscraper command is available:

spotifyscraper track 4uLU6hMCjMI75M1A2tKUQC          # entity metadata as JSON
spotifyscraper playlist <id> --max-tracks 50 --pretty
spotifyscraper download preview <id> -o ./previews --embed-cover

Every command emits JSON, so it composes with tools like jq. See the CLI guide.

Batch helpers

Each getter has a plural sibling (get_tracks, get_albums, …) that fetches many inputs and returns one BatchItem per input — index-aligned, and a dead input never aborts the rest:

items = client.get_tracks(["4uLU6hMCjMI75M1A2tKUQC", "bad-id"])
ok = [i.result for i in items if i.ok]
failed = {i.value: i.error for i in items if not i.ok}

The async client runs them concurrently, bounded by max_concurrency (default 5). See the batch guide.

Response caching

For repeated lookups, enable an opt-in persistent cache. It only stores token-free pathfinder responses — never the embed pages that carry the anonymous token — so no credential is ever written to disk:

from spotify_scraper import SpotifyClient, CacheConfig, FileCache

with SpotifyClient(cache=CacheConfig(store=FileCache())) as client:
    client.get_track("4uLU6hMCjMI75M1A2tKUQC")   # first call hits the network
    client.get_track("4uLU6hMCjMI75M1A2tKUQC")   # served from the cache

Default TTL is 24h; the FileCache is stdlib-only and the backend is pluggable. See the caching guide.

Search

search() runs one anonymous, aggregate query across every entity type and returns a typed SearchResults:

from spotify_scraper import SpotifyClient

with SpotifyClient() as client:
    results = client.search("daft punk", types=("track", "artist"), limit=5)
    print(results.total, "track matches")
    for track in results.tracks:
        print(track.name, "—", track.artists[0].name)

Hits are sparse (pass an id to get_album()/get_show() for the full entity); total is the track-match count. See the search guide.

Lyrics & transcripts

Lyrics and podcast transcripts need a Spotify account cookie (sp_dc); the library handles the token handshake for you, and one cookie powers both:

from spotify_scraper import SpotifyClient

with SpotifyClient(cookies="cookies.txt") as client:   # or cookies={"sp_dc": "..."}
    lyrics = client.get_lyrics("4uLU6hMCjMI75M1A2tKUQC")
    for line in lyrics.lines:
        print(line.start_ms, line.text)

    transcript = client.get_transcript("512ojhOuo1ktJprKbVcKyQ")   # a podcast episode
    for line in transcript.lines:
        print(line.start_ms, line.text)

Your cookie is sent only to Spotify and never logged. An episode with no transcript raises NotFoundError. See the lyrics & cookies guide.

Browser-assisted login

Don't want to copy a cookie by hand? login() opens a real browser, you sign in once, and the captured sp_dc is persisted (no password is ever collected or stored). Later runs reconnect headlessly — ideal for servers:

from spotify_scraper import SpotifyClient

with SpotifyClient() as client:
    client.login()                              # reuse a valid session, else open a browser
    print(client.get_lyrics("4uLU6hMCjMI75M1A2tKUQC").sync_type)

# A later, headless run — no browser needed:
with SpotifyClient.from_saved_session() as client:
    account = client.get_account()              # who am I?
    print(account.product, account.country, client.is_premium())
    transcript = client.get_transcript("512ojhOuo1ktJprKbVcKyQ")

login() reuses a valid saved session by default (browser only the first time); from_saved_session() never needs the browser extra. The cookie is stored in an owner-only file, or the OS keyring with store="keyring" (the keyring extra). get_account()/is_premium() report the logged-in account, and SpotifyClient.session_info() checks a saved session without exposing the cookie. See the authenticated sessions guide.

Roadmap

Shipped

Version Scope
3.0 The library: all entities, pagination, media downloads, browser fallback, docs
3.1 Command-line interface
3.2 Cookie-authenticated lyrics
3.3 Cookie-authenticated podcast transcripts (get_transcript); browser-assisted login, session persistence & account-awareness (get_account/is_premium)
3.4 Search across every entity type (search()) · display-language localization (locale)
3.5 Optional response cache (cache=CacheConfig(...)) · batch helpers with managed concurrency

What's next — the v3.3–3.5 work above completes the published roadmap. Future ideas are tracked in the GitHub milestones and issues — 👍 or weigh in on the ones that matter most to you. Scope is subject to change.

Documentation

Full docs, guides, and the API reference: https://spotifyscraper.readthedocs.io

Legal

SpotifyScraper is an unofficial, independent project, not affiliated with Spotify. It reads publicly available data and the ~30-second previews Spotify publishes; it does not download full tracks or circumvent DRM. Use it for educational and personal purposes, and in line with Spotify's Terms of Service. See the legal notice.

Contributing

Contributions are welcome — see CONTRIBUTING.md. The project is developed spec-first with OpenSpec; specs live in openspec/.

License

MIT © Ali Akhtari

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spotifyscraper-3.4.0.tar.gz (369.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spotifyscraper-3.4.0-py3-none-any.whl (104.8 kB view details)

Uploaded Python 3

File details

Details for the file spotifyscraper-3.4.0.tar.gz.

File metadata

  • Download URL: spotifyscraper-3.4.0.tar.gz
  • Upload date:
  • Size: 369.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for spotifyscraper-3.4.0.tar.gz
Algorithm Hash digest
SHA256 39da75bca29c01bedfa68f37d2880bede89f4ebf86778c2378ef96d8fcb2d019
MD5 0bff0d56ebe86566b60cde5f68112628
BLAKE2b-256 7826b0294a9b188b3dcd84c24399f394443a4f0a7b67c1ecd554cbc0f40d706c

See more details on using hashes here.

File details

Details for the file spotifyscraper-3.4.0-py3-none-any.whl.

File metadata

  • Download URL: spotifyscraper-3.4.0-py3-none-any.whl
  • Upload date:
  • Size: 104.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for spotifyscraper-3.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 db108ebe1a5ca25e0c7692606feb87da7f34224c27ef41bba6bccb6441ff8e94
MD5 4bc2e93724b0b61949c96442ed06c700
BLAKE2b-256 037f2c2bd3cfe6de5b922b29320144ef2a27976123df6733d1bdc627d5937f4a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page