Skip to main content

Board games data scraping and processing from BoardGameGeek and more!

Project description

🎲 Board Game Scraper 🕸

Scraping data about board games from the web. View the data live at Recommend.Games! Install via

pip install board-game-scraper

Sources

Run scrapers

Requires Python 3. Make sure Pipenv is installed and create the virtual environment:

python3 -m pip install --upgrade pipenv
pipenv install --dev
pipenv shell

Run a spider like so:

JOBDIR="jobs/${SPIDER}/$(date --utc +'%Y-%m-%dT%H-%M-%S')"
scrapy crawl "${SPIDER}" \
    --output 'feeds/%(name)s/%(time)s/%(class)s.csv' \
    --set "JOBDIR=${JOBDIR}"

where $SPIDER is one of the IDs above.

Run all the spiders with the run_scrapers.sh script. Get a list of the running scrapers' PIDs with the processes.sh script. You can close all the running scrapers via

./processes.sh stop

and resume them later.

Tests

You can run scrapy check to perform contract tests for all spiders, or scrapy check $SPIDER to test one particular spider. If tests fails, there most likely has been some change on the website and the spider needs updating.

Board game datasets

If you are interested in using any of the datasets produced by this scraper, take a look at the BoardGameGeek guild. A subset of the data can also be found on Kaggle.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

board-game-scraper-2.21.0.tar.gz (58.8 kB view details)

Uploaded Source

Built Distribution

board_game_scraper-2.21.0-py2.py3-none-any.whl (72.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file board-game-scraper-2.21.0.tar.gz.

File metadata

  • Download URL: board-game-scraper-2.21.0.tar.gz
  • Upload date:
  • Size: 58.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for board-game-scraper-2.21.0.tar.gz
Algorithm Hash digest
SHA256 db7342204db1acd07f41dc8bd9efadc16ed653a04d3b00cfce18e8425268276c
MD5 4b9f36ad4e885674a6b70f2e2451877f
BLAKE2b-256 a77da9d5c1635280d4ae6952433d38ae9feb8bd9074a8ef8e2568af6a8cae198

See more details on using hashes here.

File details

Details for the file board_game_scraper-2.21.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for board_game_scraper-2.21.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b3346fc2ab9fb9679ae36020c64b5d2627e826189b6684cd4050f321ffa5f4d3
MD5 8f8f51bc2d22bbce70bb0005a86ea870
BLAKE2b-256 ac3b64c44fce672c15b8b2a9037eae3df0e38ed320d3515520f72c106a373292

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page