Extract cross-platform media identifiers from Wikidata (IMDb, Trakt, TMDB, etc.)
Project description
Wikidata Identifier Extractor
A powerful Python library for extracting cross-platform media identifiers from Wikidata. Find IMDb, Trakt, TMDB, Rotten Tomatoes IDs and more for movies, TV shows, and episodes.
Features
✨ Cross-Platform Mapping: Get identifiers for IMDb, Trakt, TMDB, Rotten Tomatoes, and more
🔗 Relationship Data: Automatically fetch sequels, prequels, and series information
💾 Built-in Caching: Efficient caching to minimize API calls
🆓 No API Keys Required: Uses Wikidata's free SPARQL endpoint
📊 Comprehensive Coverage: Access millions of movies, TV shows, and episodes
🔄 Automatic URL Generation: Get ready-to-use URLs for all platforms
Installation
pip install wikidata-identifier-extractor
Quick Start
from wikidata_identifier_extractor import WikidataIdentifierExtractor
# Initialize the extractor
extractor = WikidataIdentifierExtractor()
# Search by IMDb ID
result = extractor.get_identifiers(imdb_id="tt1375666")
print(f"Title: {result['title']}") # Inception
print(f"Trakt: {result['trakt']}") # movies/inception-2010
print(f"TMDB: {result['tmdb_movie']}") # 27205
print(f"IMDb URL: {result['urls']['imdb']}") # https://www.imdb.com/title/tt1375666
Usage Examples
Search by Trakt Slug
result = extractor.get_identifiers(trakt_slug="movies/inception-2010")
print(result['imdb']) # tt1375666
print(result['wikidata_id']) # Q25188
Get Movie Sequels/Prequels
# Lord of the Rings: The Two Towers
result = extractor.get_identifiers(imdb_id="tt0167261")
# Get previous movie
if result.get('follows'):
print(result['follows']['title']) # The Fellowship of the Ring
print(result['follows']['imdb']) # tt0120737
# Get next movie
if result.get('followed_by'):
print(result['followed_by']['title']) # The Return of the King
print(result['followed_by']['imdb']) # tt0167260
# Get series information
if result.get('series'):
print(result['series']['title']) # The Lord of the Rings trilogy
Disable Relation Fetching
For faster queries when you don't need related items:
result = extractor.get_identifiers(
imdb_id="tt0167261",
fetch_relations=False # Skip fetching series/follows/followed_by
)
Response Structure
{
'wikidata_id': 'Q25188',
'title': 'Inception',
'imdb': 'tt1375666',
'trakt': 'movies/inception-2010',
'trakt_film': 'inception-2010',
'tmdb_movie': '27205',
'rotten_tomatoes': 'm/inception',
'google_kg': '/g/11b6vxwpkm',
'fandom_wiki': 'inception',
'part_of_series_id': None,
'follows_id': None,
'followed_by_id': None,
'urls': {
'wikidata': 'https://www.wikidata.org/wiki/Q25188',
'imdb': 'https://www.imdb.com/title/tt1375666',
'trakt': 'https://trakt.tv/movies/inception-2010',
'tmdb_movie': 'https://www.themoviedb.org/movie/27205',
# ... more URLs
},
'series': None, # Populated if part of a series
'follows': None, # Populated if there's a previous item
'followed_by': None # Populated if there's a next item
}
Supported Identifiers
| Platform | Property | Example |
|---|---|---|
| Wikidata | wikidata_id | Q25188 |
| IMDb | imdb | tt1375666 |
| Trakt.tv | trakt | movies/inception-2010 |
| Trakt Film | trakt_film | inception-2010 |
| TMDB Movie | tmdb_movie | 27205 |
| TMDB Series | tmdb_series | 1399 |
| TMDB Episode | tmdb_episode | 63056 |
| Rotten Tomatoes | rotten_tomatoes | m/inception |
| Fandom Wiki | fandom_wiki | lotr |
| Google Knowledge Graph | google_kg | /g/11b6vxwpkm |
Advanced Usage
Batch Processing
def process_multiple_movies(imdb_ids):
extractor = WikidataIdentifierExtractor()
results = []
for imdb_id in imdb_ids:
result = extractor.get_identifiers(imdb_id=imdb_id)
if result:
results.append(result)
return results
movies = ["tt1375666", "tt0468569", "tt0816692"]
results = process_multiple_movies(movies)
Error Handling
try:
result = extractor.get_identifiers(imdb_id="tt1375666")
if result:
print(f"Found: {result['title']}")
else:
print("No results found")
except Exception as e:
print(f"Error: {e}")
How It Works
This library uses Wikidata's SPARQL endpoint to query structured data about media content. Wikidata is a free, collaborative knowledge base that links various platform-specific identifiers together.
Key Benefits:
- 🆓 Free and open - no API keys required
- 🌐 Community-maintained and constantly updated
- 🔗 Comprehensive cross-platform linking
- 📈 Covers millions of movies, TV shows, and episodes
Performance
- Caching: Built-in memory cache prevents redundant API calls
- Configurable Depth: Control relationship fetching to balance speed vs data completeness
- Rate Limiting Friendly: Respectful of Wikidata's SPARQL endpoint limits
Requirements
- Python 3.7+
- requests >= 2.25.0
Documentation
Full documentation is available in the docs folder:
- Complete Guide: Detailed usage examples and API reference
- SPARQL Tutorial: Learn how to modify and extend queries
- Contributing: How to contribute to the project
Development
# Clone the repository
git clone https://github.com/wa8eem/wikidata-identifier-extractor.git
cd wikidata-identifier-extractor
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black .
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Wikidata for providing free access to structured data
- The Wikidata community for maintaining and updating the database
Support
- 📫 Issues: GitHub Issues
- 📖 Documentation: Full Guide
- 💬 Discussions: GitHub Discussions
Changelog
See CHANGELOG.md for a list of changes in each version.
Made with ❤️ using Wikidata
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wikidata_identifier_extractor-0.1.0.tar.gz.
File metadata
- Download URL: wikidata_identifier_extractor-0.1.0.tar.gz
- Upload date:
- Size: 15.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3095fe305d9d7a59abe22e2ba5bc9d9db3c593a397388806b12bbf6deb2e38c2
|
|
| MD5 |
0ec832016ebed0f332ce45e3953fd7e2
|
|
| BLAKE2b-256 |
c1b48641d1ccdf439ac8ed6f802d64cd60554a3f8623b825b4da8af34187fb73
|
File details
Details for the file wikidata_identifier_extractor-0.1.0-py3-none-any.whl.
File metadata
- Download URL: wikidata_identifier_extractor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef4741385e853833354c9adfdd84ea57298facc3d808c12d3abf2a5a2bfbd0e1
|
|
| MD5 |
cdf350cd972183af3f6835347ce14757
|
|
| BLAKE2b-256 |
69c7486e639dfa5a453db552da7fa439f9faf538fe777eba718cceaf1c888a72
|