How to extract movie genres from Rotten Tomatoes website
Project description
Rotten Tomatoes Scraper
You can extract information about movies and actors that are listed on the Rotten Tomatoes website using this service. Each movie has different metadata such as Rating, Genre, Box Office, or Studio. Note that the Genre has 20+ subcategories that also gives you more granular information on a movie. These metadata can be helpful for many purposes; however, I could not find a clean API to provide you all these metadata. For an actor you can extract movies listed in highest-rated or filmography sections depending on your need. Finally, I used the BeautifulSoup package to parse HTML documents obtained by the HTTP request-response in this library.
Library
The library requires the following libraries:
- rotten_tomatoes_client
- bs4
- re
- urllib
Install
It can be installed using pip:
pip install rotten_tomatoes_scraper
Usage
You can use this library to extract the complete list of movies that an actor played by calling extract_movies
method
and using section='filmography'
. Plus, you can also extract the list of top ranked movies by using the same method and
section='highest'
.
from rotten_tomatoes_scraper.rtscraper import RTScraper
rts = RTScraper()
movie_titles = rts.extract_movies('jack nicholson', section='highest')
print(movie_titles)
['Kubrick by Kubrick (Kubrick par Kubrick)', 'On a Clear Day You Can See Forever', 'The Shooting']
If you want to find out what movie genres an actor has played in, you can, first, extract the list of movies that he or
she played in using extract_movies
method. Then, you just need to feed in the list of movies to the extract_genre
method to receive a dictionary that keys are movie genres and values are the number oof movies with that genre in which
the actor played. You can easily use the code below.
from rotten_tomatoes_scraper.rtscraper import RTScraper
rts = RTScraper()
movie_titles = rts.extract_movies('meryl streep', section='highest')
movie_genres = rts.extract_genre(movie_titles)
print(movie_genres)
{'Documentary': 2, 'Comedy': 1, 'ScienceFiction&Fantasy': 1, 'Drama': 1, 'Romance': 1}
This library doesn't give you a full access to all the metadata that you may find in Rotten Tomatoes website. However, you can eaily use it to extract the most important ones.
And, that's pretty much it!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for rotten_tomatoes_scraper-1.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1acf2df0377b54ba2ba7383f0c571f5a0dd78ff403b5e8a9bcce26d550872081 |
|
MD5 | be6dba11ec1dabf74ec7b84e89edd632 |
|
BLAKE2b-256 | 343c08e1bb0a3d433f8a9b56e7401b4f0e2bdd710ea845824c3b1d5bb48d566c |
Hashes for rotten_tomatoes_scraper-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0532cb7bcf213bbd6fc3ad1ba9b84d42e8ac2875c297a431ac6e240650187138 |
|
MD5 | 73c57472e4f62d29f32530ae9cbac022 |
|
BLAKE2b-256 | c26e5de6786120460e3dd7109319b69a4d48fa5694bbef65fad1ad854be63df0 |