How to scrape Rotten Tomatoes website using an easy interface.
Project description
Rotten Tomatoes Scraper
You can extract information about movies and actors that are listed on the Rotten Tomatoes website using this module. Each movie has different metadata such as Rating, Genre, Box Office, Studio, and Scores. The Genre has 20+ subcategories that also gives you more granular information on a movie. These metadata can be helpful for many data science projects. For actors you can extract movies listed in highest-rated or filmography sections depending on your need. This module uses the BeautifulSoup package to parse HTML documents.
Install
The module requires the following libraries:
- bs4
- requests
- lxml
Then, it can be installed using pip:
pip3 install rotten_tomatoes_scraper
Usage
This module contains three classes: CelebrityScraper
, MovieScrape
and DirectorScraper
.
- CelebrityScraper: You can use this class to extract the complete list of movies that a celebrity participated by calling
extract_metadata
method and usingsection='filmography'
. Plus, you can also extract the list of top ranked movies by using the same method andsection='highest'
.
from rotten_tomatoes_scraper.rt_scraper import CelebrityScraper
celebrity_scraper = CelebrityScraper(celebrity_name='jack nicholson')
celebrity_scraper.extract_metadata(section='highest')
movie_titles = celebrity_scraper.metadata['movie_titles']
print(movie_titles)
['On a Clear Day You Can See Forever', 'The Shooting', 'Chinatown', 'Broadcast News']
- MovieScraper: You can use this class to extract metadata of movies. You can feed
movie_url
ormovie_title
to extract the movie metadata. If you want to find out what movie genres an actor has participated, you can, first, extract the list of movies that he or she participated usingCelebrityScraper
. Then, you must instantiate theMovieScraper
and feed themovie_title
to theextract_metada
method. You can see the code below.
from rotten_tomatoes_scraper.rt_scraper import MovieScraper
movie_scraper = MovieScraper(movie_title='Vicky Cristina Barcelona')
movie_scraper.extract_metadata()
print(movie_scraper.metadata)
{'Score_Rotten': '81', 'Score_Audience': '74', 'Genre': ['comedy', 'drama', 'romance']}
from rotten_tomatoes_scraper.rt_scraper import MovieScraper
movie_url = 'https://www.rottentomatoes.com/m/marriage_story_2019'
movie_scraper = MovieScraper(movie_url=movie_url)
movie_scraper.extract_metadata()
print(movie_scraper.metadata)
{'Score_Rotten': '94', 'Score_Audience': '85', 'Genre': ['comedy', 'drama']}
- DirectorScraper: You can use this class to extract metadata of directors. You can feed
director_url
ordirector_name
to extract the director metadata.
from rotten_tomatoes_scraper.rt_scraper import DirectorScraper
director_url = 'https://www.rottentomatoes.com/celebrity/steven_spielberg'
director_scraper = DirectorScraper(director_url=director_url)
director_scraper.extract_metadata()
print(director_scraper.metadata['Jaws'])
{'Year': '1975', 'Score_Rotten': '98', 'Box_Office': '260870000'}
from rotten_tomatoes_scraper.rt_scraper import DirectorScraper
director_scraper = DirectorScraper(director_name='stanley kubrick')
director_scraper.extract_metadata()
movie_titles = list(director_scraper.metadata.keys())
print(movie_titles)
['Eyes Wide Shut', 'Full Metal Jacket', 'The Shining', 'Barry Lyndon', 'A Clockwork Orange', '2001: A Space Odyssey',
'Dr. Strangelove Or: How I Learned to Stop Worrying and Love the Bomb', 'Lolita', 'Spartacus', 'Paths of Glory',
'The Killing', "Killer's Kiss", 'The Seafarers', 'Fear and Desire', 'Day of the Fight', 'Flying Padre']
This module doesn't give you a full access to all the metadata that you may find in Rotten Tomatoes website. However, you can easily use it to extract the most important ones.
And, that's pretty much it!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file rotten_tomatoes_scraper-1.4.0.tar.gz
.
File metadata
- Download URL: rotten_tomatoes_scraper-1.4.0.tar.gz
- Upload date:
- Size: 6.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 934307c42c8dcf9e0bc5766a0299cbf1e9b9cdbe165766d9001946ee4ae05ffb |
|
MD5 | 8e0e7857712769efdc37dcc939e6a55f |
|
BLAKE2b-256 | ccb4fe3ed279bb91e06f4134162728f7eb39794b5df1d5d4c4cbff499af02703 |
File details
Details for the file rotten_tomatoes_scraper-1.4.0-py3-none-any.whl
.
File metadata
- Download URL: rotten_tomatoes_scraper-1.4.0-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5a7bad8ceae609ca782e96364be4287e173c0b8fe0e6c4f4cd719cc1a65932f |
|
MD5 | 30be2bc36e9d91af25bf308cf7be26f6 |
|
BLAKE2b-256 | a95aa64d8989a8f122eb423daec79a75444f0cf854fde7d41d2b6de5201c88c3 |