Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

Package for parsing IMDb datasets and scraping web pages

Project description

PyMDb

PyPI Python Versions License Build Status

PyMDb is a package for both parsing the datasets provided by IMDb and scraping information from their web pages.

This package is able to gather information on people, titles, and companies provided by IMDb and is split into two separate modules: one for parsing the IMDb datasets, and one for scraping webpages on imdb.com.

Installation

The latest release of PyMDb can be installed from PyPI with:

pip install py-mdb

If downloading the source from GitHub, PyMDb requires the following packages:

Usage

import pymdb
from collections import defaultdict

parser = pymdb.PyMDbParser(gunzip_files=True)
genre_count = defaultdict(int)
for title in parser.get_title_basics("path/to/files"):
    for genre in title.genres:
        genre_count[genre] += 1
for genre in genre_count:
    print(f"{genre}: {genre_count[genre]}")

scraper = pymdb.PyMDbScraper()
title = scraper.get_title("tt0076759")
print(f"{title.title_text} came out in {title.release_date.year}!")

Documentation

Full documentation can be found at the PyMDb Read the Docs page.

License

This project is licensed under the MIT License. Please see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for py-mdb, version 0.1.0
Filename, size File type Python version Upload date Hashes
Filename, size py_mdb-0.1.0-py3-none-any.whl (38.5 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size py-mdb-0.1.0.tar.gz (33.7 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page