Package for parsing IMDb datasets and scraping web pages
Project description
PyMDb
PyMDb is a package for both parsing the datasets provided by IMDb and scraping information from their web pages.
This package is able to gather information on people, titles, and companies provided by IMDb and is split into two separate modules: one for parsing the IMDb datasets, and one for scraping webpages on imdb.com.
Installation
The latest release of PyMDb can be installed from PyPI with:
pip install py-mdb
If downloading the source from GitHub, PyMDb requires the following packages:
Usage
>>> import pymdb
>>> from collections import defaultdict
>>>
>>> parser = pymdb.PyMDbParser(gunzip_files=True)
>>> genre_count = defaultdict(int)
>>> for title in parser.get_title_basics("path/to/files"):
... for genre in title.genres:
... genre_count[genre] += 1
...
>>> for genre in genre_count:
... print(f"{genre}: {genre_count[genre]}")
...
Documentary: 600184
Short: 837912
Animation: 312227
...
Talk-Show: 584252
Reality-TV: 307037
Adult: 178493
>>>
>>> scraper = pymdb.PyMDbScraper(rate_limit=500)
>>> title = scraper.get_title("tt0076759")
>>> print(f"{title.display_title} came out in {title.release_date.year}!")
Star Wars: Episode IV - A New Hope came out in 1977!
Documentation
Full documentation can be found at the PyMDb Read the Docs page.
Disclaimer
PyMDb is still in a pre-release state and has only been tested with a small amount of data found on imdb.com. The web scraper portion of the code does have a rate limiter value you can customize, please be kind to IMDb. If any bugs or issues are found, please do not hesitate to create an issue or make a pull request on GitHub. Suggestions for features to be added to PyMDb in future releases are also welcome!
License
This project is licensed under the MIT License. Please see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file py-mdb-0.2.3.tar.gz
.
File metadata
- Download URL: py-mdb-0.2.3.tar.gz
- Upload date:
- Size: 41.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0fe9d0364f95a1e7c30b68aafe9bc74a47ddeca1420f4916908452244c45f624 |
|
MD5 | 3c0f14da8de78e13b73a6f357bede026 |
|
BLAKE2b-256 | 62b96fe47feeb52dc478c9c77a8985d3a3198d120cfc154645476f4edd88bbaa |
File details
Details for the file py_mdb-0.2.3-py3-none-any.whl
.
File metadata
- Download URL: py_mdb-0.2.3-py3-none-any.whl
- Upload date:
- Size: 54.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6ec82b34a4a1cbde07b4a0158ddd605befbf65c7e27fdec1ed023b528f3ccb3 |
|
MD5 | 9473fec56cf3e3d9f6d2e280bfb03307 |
|
BLAKE2b-256 | 4c7dc087f6d0d72e29c17d1275e144c7f7c33fcf6f69b02a1a1c8f97eac10201 |