Skip to main content

Scraper and heatmap plotter for episode ratings of series on IMDB

Project description

IMDB Series Rating Scraper

Introduction

This tool scrapes the website https://www.imdb.com for ratings of individual episodes of a series. A csv file is generated to cache the ratings. Using matplotlib, the tool then generates a heatmap representation of all episodes in the series. Because this tools relies on scraping the html tree of the imdb page, it might break anytime. Feel free to message me if the scraper doesn't work anymore or create a pull request with adjusted xpaths.

Examples

Data output

The following table shows data that is generated by the scraper for the first season of Breaking Bad. For the full data output see examples/data/Breaking Bad.csv.

season episode name rating
1 1 Pilot 9.0
1 2 Cat's in the Bag... 8.6
1 3 ...And the Bag's in the River 8.7
1 4 Cancer Man 8.2
1 5 Gray Matter 8.3
1 6 Crazy Handful of Nothin' 9.3
1 7 A No-Rough-Stuff-Type Deal 8.8

Heatmap output

The following image shows an example of the heatmap that can be generated. Heatmaps of some example series can be found under examples/img/.

Quickstart

Dependencies

  • Python version Python 3.9.13
  • Python packages see requirements.txt

Setup

  1. Clone this repository
HTTPS $ git clone https://github.com/trflorian/imdb-scraper-heatmap.git
SSH $ git clone git@github.com:trflorian/imdb-scraper-heatmap.git
  1. (Optional) Create a virtual environment for this project
  2. Install the required python packages in your python environment.

$ python -m pip install -r requirements.txt

  1. Run $ python scraper.py to scrape the IMDB website for a specific series.
  2. Run $ python heatmap.py to create a plot for the scraped series.

Usage

$ python .\examples\heatmap.py --help

usage: heatmap.py [-h] [-s] [-d] [-o] [-n NAME]

optional arguments:
  -h, --help            show this help message and exit
  -s, --show            show the heatmap plot instead of saving it
  -d, --dark            use dark mode for the plot style
  -o, --override        override existing plots, only used if show flag is not set
  -n NAME, --name NAME  name of the series, if not set the whole data directory will be scanned

Development

Upload to Pypi

python -m build

python -m twine upload --skip-existing dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seriesheatmap-0.0.4.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seriesheatmap-0.0.4-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file seriesheatmap-0.0.4.tar.gz.

File metadata

  • Download URL: seriesheatmap-0.0.4.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for seriesheatmap-0.0.4.tar.gz
Algorithm Hash digest
SHA256 6da2a0ab81c5130bb6b6949dd487c764b3fb01fa7a6a95dde4ad0e158150315d
MD5 7ba4d9ed5d39c1d4610a7e1086821dd9
BLAKE2b-256 953b3bba16d94af7d186f54d99e4d4e880a4b3a4514b9f7a58e63897236b021d

See more details on using hashes here.

File details

Details for the file seriesheatmap-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: seriesheatmap-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for seriesheatmap-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 56ba6e9cb69de33e7aaddf98d664c2a028f9031b08a430502a35be6fd8330bec
MD5 4c13d083828806d1e636ec7554dc5017
BLAKE2b-256 a5e9be899d080474802df78daa7230ded871ed0f2d42caa06dc2767415461f6c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page