Skip to main content

Package for parsing IMDb datasets and scraping web pages

Project description

PyMDb

PyPI Python Versions License Build Status

PyMDb is a package for both parsing the datasets provided by IMDb and scraping information from their web pages.

This package is able to gather information on people, titles, and companies provided by IMDb and is split into two separate modules: one for parsing the IMDb datasets, and one for scraping webpages on imdb.com.

Installation

The latest release of PyMDb can be installed from PyPI with:

pip install py-mdb

If downloading the source from GitHub, PyMDb requires the following packages:

Usage

>>> import pymdb
>>> from collections import defaultdict
>>>
>>> parser = pymdb.PyMDbParser(gunzip_files=True)
>>> genre_count = defaultdict(int)
>>> for title in parser.get_title_basics("path/to/files"):
...     for genre in title.genres:
...             genre_count[genre] += 1
...
>>> for genre in genre_count:
...     print(f"{genre}: {genre_count[genre]}")
...
Documentary: 600184
Short: 837912
Animation: 312227
    ...
Talk-Show: 584252
Reality-TV: 307037
Adult: 178493
>>>
>>> scraper = pymdb.PyMDbScraper(rate_limit=500)
>>> title = scraper.get_title("tt0076759")
>>> print(f"{title.display_title} came out in {title.release_date.year}!")
Star Wars: Episode IV - A New Hope came out in 1977!

Documentation

Full documentation can be found at the PyMDb Read the Docs page.

Disclaimer

PyMDb is still in a pre-release state and has only been tested with a small amount of data found on imdb.com. The web scraper portion of the code does have a rate limiter value you can customize, please be kind to IMDb. If any bugs or issues are found, please do not hesitate to create an issue or make a pull request on GitHub. Suggestions for features to be added to PyMDb in future releases are also welcome!

License

This project is licensed under the MIT License. Please see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-mdb-0.2.3.tar.gz (41.6 kB view details)

Uploaded Source

Built Distribution

py_mdb-0.2.3-py3-none-any.whl (54.5 kB view details)

Uploaded Python 3

File details

Details for the file py-mdb-0.2.3.tar.gz.

File metadata

  • Download URL: py-mdb-0.2.3.tar.gz
  • Upload date:
  • Size: 41.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for py-mdb-0.2.3.tar.gz
Algorithm Hash digest
SHA256 0fe9d0364f95a1e7c30b68aafe9bc74a47ddeca1420f4916908452244c45f624
MD5 3c0f14da8de78e13b73a6f357bede026
BLAKE2b-256 62b96fe47feeb52dc478c9c77a8985d3a3198d120cfc154645476f4edd88bbaa

See more details on using hashes here.

File details

Details for the file py_mdb-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: py_mdb-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 54.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for py_mdb-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d6ec82b34a4a1cbde07b4a0158ddd605befbf65c7e27fdec1ed023b528f3ccb3
MD5 9473fec56cf3e3d9f6d2e280bfb03307
BLAKE2b-256 4c7dc087f6d0d72e29c17d1275e144c7f7c33fcf6f69b02a1a1c8f97eac10201

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page