Skip to main content

A universal scraping tool to acquire CS:GO demofiles from professional esports events provided by hltv.org

Project description

GoScrape 🐙: Universal hltv.org demofile scraper

Build and publish Python 🐍 distributions 📦 to PyPI

Go scrape is a little open source project I created to make it easy to bulk download demofiles for the FPS CS:GO from the popular CS:GO fansite hltv.org.

Installation in Python - PyPi release

GoScrape is on PyPi, so you can use pip to install it.

  pip install goscrape

TL;DR

GoScrape consists of two main commands.

command description
events used in the first step to create a json lookup file containing important and structured information about CS:GO esports events in a given timeframe and if specified also links to associated demofiles and matches.
fetch build on top of the events command and can be used to bulk download the demofile json output from the events command otherwise a single event id can be specified to simply download demofiles for that event.

tldr

Getting Started

Events 🎮

events

argument datatype description notes
STARTDATE string the start date from when evet data should be gathered formatted as string 'YYYY-MM-DD' required
ENDDATE string the date to which event data should be gathered formatted as string 'YYYY-MM-DD' required
STORAGEPATH string the directory or filepath to which the resulting json should be stored optional (default is cwd)
MATCHES boolean whether match information and demofile urls should be scraped as well This flag is required if the resulting json file
should be used for the fetch command
optional (True if present)
EVENT TYPE enum Which type of event datashould be pulled (Online, Lan ...) optional (default is online)

The Objects in the resulting json are identified by their event id given as a key and will look something like this:

{
  "6475": {
    "event_data": {
      "entity": "event",
      "event_id": "6475",
      "event_url": "https://www.hltv.org/events/6475/iem-dallas-2022-oceania-open-qualifier-2",
      "event_name_encoded": "iem-dallas-2022-oceania-open-qualifier-2",
      "event_name_full": "IEM Dallas 2022 Oceania Open Qualifier 2",
      "nr_of_teams": "8+",
      "prize": "Other",
      "event_type": "Online",
      "location": "Oceania (Online)",
      "event_start": "2022-04-20",
      "event_end": "2022-04-21"
    },
    "matches": [
      {
        "entity": "match",
        "teams": ["Paradox", "Aftershock"],
        "date_time": "2022-04-21 10:00:00",
        "match_url": "https://www.hltv.org//matches/2355881/paradox-vs-aftershock-iem-dallas-2022-oceania-open-qualifier-2",
        "demo_id": "71497",
        "demo_url": "https://www.hltv.org/download/demo/71497"
      }
    ]
  }

Fetch 💾

fetch

argument datatype description notes
EVENT ID string | int the start date from when evet data should be gathered LOOKUP FILE & EVENT ID are mutually exclusive
only one can be used
required
LOOKUP FILE string the filepath of the by the events command generated lookup that should be sued for demo downloading LOOKUP FILE & EVENT ID are mutually exclusive
only one can be used
required
STORAGEPATH string the directory to which the demofiles should be written optional (default is cwd)
MULTIPROCESSING boolean whether multiprocessing should be utilized to speed up downloading optional (True if present)

Changelog

Version 0.1.1 (2022.04.29)

  • Bug Fixes on multiprocessed downloading

Version 0.1.0 (2022.04.24)

  • Initial release

Contributing

Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

goscrape-0.1.1.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

goscrape-0.1.1-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file goscrape-0.1.1.tar.gz.

File metadata

  • Download URL: goscrape-0.1.1.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for goscrape-0.1.1.tar.gz
Algorithm Hash digest
SHA256 518905f676a5ae93625988e53d5b8f813f9f422232fa6a90aebc0fec92733dd4
MD5 42cb1112dabb8f8e11b024566bdfe5de
BLAKE2b-256 ee3d7679cfdd31252aa97ce480d675de34e8c315957bb42ed63afb6179eccfe1

See more details on using hashes here.

File details

Details for the file goscrape-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: goscrape-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for goscrape-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b938433c34ab0a6e626e48e9de32695dc2910fdb4a954a896f8f5d802b9653bd
MD5 348a701b3b6a6de49e663b327a2cd36a
BLAKE2b-256 5391cd7e9ba917f4f7fc5e85d85a796a668ca88190cf9b9cc7135489fca1ef7e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page