Fetch artists from YouTube Music and Spotify with MusicBrainz IDs
Project description
Artist Scraper 
A production-ready tool to fetch artists from YouTube Music and Spotify, look up their MusicBrainz IDs, and optionally add them to Lidarr for monitoring.
Recent Updates
November 25, 2025: Major update to YouTube Music integration
- Replaced unreliable ytmusicapi internal API with official YouTube Data API v3
- Fixed HTTP 400 errors that prevented YouTube Music from working
- Improved artist extraction from video titles and channel names
- See CHANGELOG_YOUTUBE_FIX.md for full details
Features
- Fetch artists from multiple sources:
- Spotify: Liked tracks, followed artists, and all playlists (public & private)
- YouTube Music: Liked videos, channel subscriptions, and all playlists (via YouTube Data API v3)
- Look up MusicBrainz IDs for all artists
- Track play counts for each artist
- Export to CSV with artist names, MusicBrainz IDs, sources, and play counts
- Import CSV to Lidarr with optional filtering by play count
- Automatic deduplication of artists
- Optional Lidarr integration to automatically add and monitor artists
- Beautiful CLI with colors, progress bars, and clear feedback
- Comprehensive logging of skipped artists
Installation
Prerequisites
- Python 3.12 or higher
- Poetry (for dependency management)
Setup
- Clone the repository:
cd /path/to/artistscraper
- Install dependencies:
poetry install
- Copy the example configuration:
cp config.example.json config.json
- Configure your API credentials (see Configuration section below)
Configuration
Edit config.json with your credentials:
Spotify Configuration
- Go to Spotify Developer Dashboard
- Create a new app
- Note your
Client IDandClient Secret - Add
http://localhost:8888/callbackas a Redirect URI - Get a refresh token:
- Use a tool like spotify-refresh-token-generator or the Spotipy documentation
- Required scopes:
user-library-read,user-follow-read,playlist-read-private,playlist-read-collaborative
Add to config.json:
"spotify": {
"client_id": "your_spotify_client_id",
"client_secret": "your_spotify_client_secret",
"refresh_token": "your_spotify_refresh_token"
}
YouTube Music Configuration
Note: As of November 2025, the YouTube Music integration uses the YouTube Data API v3 for better reliability. See YOUTUBE_MUSIC_SETUP.md for detailed setup instructions and troubleshooting.
Quick Setup
- Create a Google Cloud Project and enable YouTube Data API v3
- Create OAuth 2.0 credentials (Desktop app type)
- Run the OAuth flow:
poetry run ytmusicapi oauth --client-id YOUR_CLIENT_ID --client-secret YOUR_CLIENT_SECRET
- Update
config.json:
"youtube_music": {
"auth_file": "ytmusic_auth.json",
"client_id": "your_google_client_id.apps.googleusercontent.com",
"client_secret": "your_google_client_secret"
}
For detailed instructions, troubleshooting, and information about the recent API changes, see YOUTUBE_MUSIC_SETUP.md.
MusicBrainz Configuration
Update with your email:
"musicbrainz": {
"user_agent": "artistscraper/0.1.0 (your-email@example.com)"
}
Lidarr Configuration (Optional)
If you plan to use the --lidarr flag:
- Open your Lidarr instance
- Go to Settings � General
- Copy your API Key
Add to config.json:
"lidarr": {
"url": "http://localhost:8686",
"api_key": "your_lidarr_api_key"
}
Usage
Scrape Command
Fetch artists from both Spotify and YouTube Music:
poetry run artistscraper scrape
Or if installed globally:
artistscraper scrape
Scrape Command Options
Options:
--config, -c PATH Path to configuration file (default: config.json)
--spotify-only Fetch artists from Spotify only
--youtube-only Fetch artists from YouTube Music only
--skip-musicbrainz Skip MusicBrainz ID lookup
--lidarr Add artists to Lidarr after export
--output, -o PATH Output CSV file path (overrides config)
--verbose, -v Enable verbose output
--help Show this message and exit
Scrape Examples
Fetch from Spotify only:
artistscraper scrape --spotify-only
Fetch from YouTube Music only:
artistscraper scrape --youtube-only
Fetch and add to Lidarr:
artistscraper scrape --lidarr
Custom output file:
artistscraper scrape --output my_artists.csv
Skip MusicBrainz lookup (faster, just export artist names):
artistscraper scrape --skip-musicbrainz
Import Command
Import artists from a CSV file to Lidarr:
poetry run artistscraper import artists.csv
Import Command Options
Options:
--config, -c PATH Path to configuration file (default: config.json)
--min-plays INTEGER Only import artists with at least this many plays
--verbose, -v Enable verbose output
--help Show this message and exit
Import Examples
Import all artists from CSV:
artistscraper import artists.csv
Import only artists with at least 10 plays:
artistscraper import artists.csv --min-plays 10
Import only artists with at least 50 plays (with verbose output):
artistscraper import artists.csv --min-plays 50 --verbose
Output
The tool generates two files:
-
artists.csv (or custom name): Main output with four columns:
- Artist Name
- MusicBrainz ID (format:
lidarr:ID) - Source (Spotify, YouTube Music, or both)
- Play Count (number of tracks by this artist)
-
skipped_artists.log: List of artists without MusicBrainz IDs
Example CSV output:
Artist Name,MusicBrainz ID,Source,Play Count
Taylor Swift,lidarr:20244d07-534f-4eff-b4d4-930878889970,Spotify,45
The Beatles,lidarr:b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d,"Spotify, YouTube Music",23
Radiohead,lidarr:a74b1b7f-71a5-4011-9441-d0b5e4122711,YouTube Music,12
Lidarr Integration
When using the --lidarr flag:
- The tool connects to your local Lidarr instance
- Artists with MusicBrainz IDs are searched in Lidarr
- New artists are added with monitoring enabled
- Existing artists are skipped
- Uses your Lidarr default settings for:
- Root folder
- Quality profile
- Metadata profile
Troubleshooting
"Configuration file not found"
Make sure you've copied config.example.json to config.json and filled in your credentials.
"Failed to authenticate with Spotify"
- Verify your
client_id,client_secret, andrefresh_tokenare correct - Make sure your refresh token hasn't expired (regenerate if needed)
"Failed to authenticate with YouTube Music"
- Run
poetry run ytmusicapi oauth --client-id YOUR_ID --client-secret YOUR_SECRETto regenerate authentication - Make sure the auth file path in
config.jsonmatches the generated file - Ensure YouTube Data API v3 is enabled in your Google Cloud project
- See YOUTUBE_MUSIC_SETUP.md for detailed troubleshooting
"HTTP 400: Bad Request" from YouTube Music
This error was fixed in the November 2025 update. If you're still seeing it:
- Make sure you have the latest version of the code
- Regenerate your OAuth token with your Google Cloud credentials
- See CHANGELOG_YOUTUBE_FIX.md for migration instructions
"HTTP 403: Forbidden" from YouTube Music
- YouTube Data API v3 is not enabled in your Google Cloud project
- Your OAuth token doesn't have the correct scopes
- Regenerate your token after enabling the API
"Failed to connect to Lidarr"
- Verify Lidarr is running
- Check the URL and API key in
config.json - Ensure you're using
http://orhttps://in the URL
MusicBrainz rate limiting
The tool automatically respects MusicBrainz's rate limit (1 request per second). For large libraries, this process may take a while.
Low MusicBrainz match rate
- MusicBrainz matching uses a 90% similarity threshold
- Some artists may not be in the MusicBrainz database
- Check
skipped_artists.logfor artists without matches - You can manually add MusicBrainz IDs for important artists
Contributing
This is a personal project, but suggestions and improvements are welcome!
Acknowledgments
- ytmusicapi - YouTube Music API
- spotipy - Spotify API
- musicbrainzngs - MusicBrainz API
- typer - CLI framework
- rich - Terminal formatting
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file artistscraper-1.1.1.tar.gz.
File metadata
- Download URL: artistscraper-1.1.1.tar.gz
- Upload date:
- Size: 20.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a0cd72062cd3546aed4ed4cfe862a0013693ce1cad85bbd3e2cc7757056dfd6
|
|
| MD5 |
1e9f7587c069e07d8c271c9d4a4eff48
|
|
| BLAKE2b-256 |
77004b881a37b6acbe36d665ce76c479a599b10c6ef445cf1028f2926ced3384
|
File details
Details for the file artistscraper-1.1.1-py3-none-any.whl.
File metadata
- Download URL: artistscraper-1.1.1-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59fd119d7b175e744711070c47c3f3d18f1e15cf0d9da668888da6ead1340357
|
|
| MD5 |
13aeda4f6a3ac5b60c1c551861fb805d
|
|
| BLAKE2b-256 |
e3e657efc2a204db415b2b74454643b8b5a1fdbfac35139fbe20e8b7bf53abd3
|