Fetch artists from YouTube Music and Spotify with MusicBrainz IDs

These details have not been verified by PyPI

Project links

Project description

Artist Scraper

A production-ready tool to fetch artists from YouTube Music and Spotify, look up their MusicBrainz IDs, and optionally add them to Lidarr for monitoring.

Recent Updates

November 25, 2025: Major update to YouTube Music integration

Replaced unreliable ytmusicapi internal API with official YouTube Data API v3
Fixed HTTP 400 errors that prevented YouTube Music from working
Improved artist extraction from video titles and channel names
See CHANGELOG_YOUTUBE_FIX.md for full details

Features

Fetch artists from multiple sources:
- Spotify: Liked tracks, followed artists, and all playlists (public & private)
- YouTube Music: Liked videos, channel subscriptions, and all playlists (via YouTube Data API v3)
Look up MusicBrainz IDs for all artists
Track play counts for each artist
Export to CSV with artist names, MusicBrainz IDs, sources, and play counts
Import CSV to Lidarr with optional filtering by play count
Automatic deduplication of artists
Optional Lidarr integration to automatically add and monitor artists
Beautiful CLI with colors, progress bars, and clear feedback
Comprehensive logging of skipped artists

Installation

Prerequisites

Python 3.12 or higher
Poetry (for dependency management)

Setup

Clone the repository:

cd /path/to/artistscraper

Install dependencies:

poetry install

Copy the example configuration:

cp config.example.json config.json

Configure your API credentials (see Configuration section below)

Configuration

Edit config.json with your credentials:

Spotify Configuration

Go to Spotify Developer Dashboard
Create a new app
Note your Client ID and Client Secret
Add http://localhost:8888/callback as a Redirect URI
Get a refresh token:
- Use a tool like spotify-refresh-token-generator or the Spotipy documentation
- Required scopes: user-library-read, user-follow-read, playlist-read-private, playlist-read-collaborative

Add to config.json:

"spotify": {
  "client_id": "your_spotify_client_id",
  "client_secret": "your_spotify_client_secret",
  "refresh_token": "your_spotify_refresh_token"
}

YouTube Music Configuration

Note: As of November 2025, the YouTube Music integration uses the YouTube Data API v3 for better reliability. See YOUTUBE_MUSIC_SETUP.md for detailed setup instructions and troubleshooting.

Quick Setup

Create a Google Cloud Project and enable YouTube Data API v3
Create OAuth 2.0 credentials (Desktop app type)
Run the OAuth flow:

poetry run ytmusicapi oauth --client-id YOUR_CLIENT_ID --client-secret YOUR_CLIENT_SECRET

Update config.json:

"youtube_music": {
  "auth_file": "ytmusic_auth.json",
  "client_id": "your_google_client_id.apps.googleusercontent.com",
  "client_secret": "your_google_client_secret"
}

For detailed instructions, troubleshooting, and information about the recent API changes, see YOUTUBE_MUSIC_SETUP.md.

MusicBrainz Configuration

Update with your email:

"musicbrainz": {
  "user_agent": "artistscraper/0.1.0 (your-email@example.com)"
}

Lidarr Configuration (Optional)

If you plan to use the --lidarr flag:

Open your Lidarr instance
Go to Settings � General
Copy your API Key

Add to config.json:

"lidarr": {
  "url": "http://localhost:8686",
  "api_key": "your_lidarr_api_key"
}

Usage

Scrape Command

Fetch artists from both Spotify and YouTube Music:

poetry run artistscraper scrape

Or if installed globally:

artistscraper scrape

Scrape Command Options

Options:
  --config, -c PATH          Path to configuration file (default: config.json)
  --spotify-only             Fetch artists from Spotify only
  --youtube-only             Fetch artists from YouTube Music only
  --skip-musicbrainz         Skip MusicBrainz ID lookup
  --lidarr                   Add artists to Lidarr after export
  --output, -o PATH          Output CSV file path (overrides config)
  --verbose, -v              Enable verbose output
  --help                     Show this message and exit

Scrape Examples

Fetch from Spotify only:

artistscraper scrape --spotify-only

Fetch from YouTube Music only:

artistscraper scrape --youtube-only

Fetch and add to Lidarr:

artistscraper scrape --lidarr

Custom output file:

artistscraper scrape --output my_artists.csv

Skip MusicBrainz lookup (faster, just export artist names):

artistscraper scrape --skip-musicbrainz

Import Command

Import artists from a CSV file to Lidarr:

poetry run artistscraper import artists.csv

Import Command Options

Options:
  --config, -c PATH          Path to configuration file (default: config.json)
  --min-plays INTEGER        Only import artists with at least this many plays
  --verbose, -v              Enable verbose output
  --help                     Show this message and exit

Import Examples

Import all artists from CSV:

artistscraper import artists.csv

Import only artists with at least 10 plays:

artistscraper import artists.csv --min-plays 10

Import only artists with at least 50 plays (with verbose output):

artistscraper import artists.csv --min-plays 50 --verbose

Output

The tool generates two files:

artists.csv (or custom name): Main output with four columns:
- Artist Name
- MusicBrainz ID (format: lidarr:ID)
- Source (Spotify, YouTube Music, or both)
- Play Count (number of tracks by this artist)
skipped_artists.log: List of artists without MusicBrainz IDs

Example CSV output:

Artist Name,MusicBrainz ID,Source,Play Count
Taylor Swift,lidarr:20244d07-534f-4eff-b4d4-930878889970,Spotify,45
The Beatles,lidarr:b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d,"Spotify, YouTube Music",23
Radiohead,lidarr:a74b1b7f-71a5-4011-9441-d0b5e4122711,YouTube Music,12

Lidarr Integration

When using the --lidarr flag:

The tool connects to your local Lidarr instance
Artists with MusicBrainz IDs are searched in Lidarr
New artists are added with monitoring enabled
Existing artists are skipped
Uses your Lidarr default settings for:
- Root folder
- Quality profile
- Metadata profile

Troubleshooting

"Configuration file not found"

Make sure you've copied config.example.json to config.json and filled in your credentials.

"Failed to authenticate with Spotify"

Verify your client_id, client_secret, and refresh_token are correct
Make sure your refresh token hasn't expired (regenerate if needed)

"Failed to authenticate with YouTube Music"

Run poetry run ytmusicapi oauth --client-id YOUR_ID --client-secret YOUR_SECRET to regenerate authentication
Make sure the auth file path in config.json matches the generated file
Ensure YouTube Data API v3 is enabled in your Google Cloud project
See YOUTUBE_MUSIC_SETUP.md for detailed troubleshooting

"HTTP 400: Bad Request" from YouTube Music

This error was fixed in the November 2025 update. If you're still seeing it:

Make sure you have the latest version of the code
Regenerate your OAuth token with your Google Cloud credentials
See CHANGELOG_YOUTUBE_FIX.md for migration instructions

"HTTP 403: Forbidden" from YouTube Music

YouTube Data API v3 is not enabled in your Google Cloud project
Your OAuth token doesn't have the correct scopes
Regenerate your token after enabling the API

"Failed to connect to Lidarr"

Verify Lidarr is running
Check the URL and API key in config.json
Ensure you're using http:// or https:// in the URL

MusicBrainz rate limiting

The tool automatically respects MusicBrainz's rate limit (1 request per second). For large libraries, this process may take a while.

Low MusicBrainz match rate

MusicBrainz matching uses a 90% similarity threshold
Some artists may not be in the MusicBrainz database
Check skipped_artists.log for artists without matches
You can manually add MusicBrainz IDs for important artists

Contributing

This is a personal project, but suggestions and improvements are welcome!

Acknowledgments

ytmusicapi - YouTube Music API
spotipy - Spotify API
musicbrainzngs - MusicBrainz API
typer - CLI framework
rich - Terminal formatting

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.2.0

Nov 26, 2025

1.1.3

Nov 25, 2025

This version

1.1.2

Nov 25, 2025

1.1.1

Nov 25, 2025

1.1.0

Nov 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

artistscraper-1.1.2.tar.gz (20.2 kB view details)

Uploaded Nov 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

artistscraper-1.1.2-py3-none-any.whl (21.6 kB view details)

Uploaded Nov 25, 2025 Python 3

File details

Details for the file artistscraper-1.1.2.tar.gz.

File metadata

Download URL: artistscraper-1.1.2.tar.gz
Upload date: Nov 25, 2025
Size: 20.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for artistscraper-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`736ce814fd50f7182c2a0612807393815580b344a081e53eb7b05fbe060ddbbf`
MD5	`bf50caa868083f54d92bcdbc0f9970bb`
BLAKE2b-256	`6ec46baec61694b408c69dcab692b26e7e2c0bb22962182f76dbebe6aa461f3f`

See more details on using hashes here.

File details

Details for the file artistscraper-1.1.2-py3-none-any.whl.

File metadata

Download URL: artistscraper-1.1.2-py3-none-any.whl
Upload date: Nov 25, 2025
Size: 21.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for artistscraper-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`364a3910d9895bf4bf84beec5fdb0fee5c4c1914ebc4cf312183ebf9f07f86a9`
MD5	`561aed61253c6880d7a5a3b6dd652386`
BLAKE2b-256	`a146e8e9c23a4ce689509f51a667d6352757b3e56201509143d30b4852421ba0`

See more details on using hashes here.

artistscraper 1.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Artist Scraper

Recent Updates

Features

Installation

Prerequisites

Setup

Configuration

Spotify Configuration

YouTube Music Configuration

Quick Setup

MusicBrainz Configuration

Lidarr Configuration (Optional)

Usage

Scrape Command

Scrape Command Options

Scrape Examples

Import Command

Import Command Options

Import Examples

Output

Lidarr Integration

Troubleshooting

"Configuration file not found"

"Failed to authenticate with Spotify"

"Failed to authenticate with YouTube Music"

"HTTP 400: Bad Request" from YouTube Music

"HTTP 403: Forbidden" from YouTube Music

"Failed to connect to Lidarr"

MusicBrainz rate limiting

Low MusicBrainz match rate

Contributing

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes