Skip to main content

A universal scraping tool to acquire CS:GO demofiles from professional esports events provided by hltv.org

Project description

GoScrape 🐙: Universal hltv.org demofile scraper

Build and publish Python 🐍 distributions 📦 to PyPI

Go scrape is a little open source project I created to make it easy to bulk download demofiles for the FPS CS:GO from the popular CS:GO fansite hltv.org.

Installation in Python - PyPi release

GoScrape is on PyPi, so you can use pip to install it.

  pip install goscrape

TL;DR

GoScrape consists of two main commands.

command description
events used in the first step to create a json lookup file containing important and structured information about CS:GO esports events in a given timeframe and if specified also links to associated demofiles and matches.
fetch build on top of the events command and can be used to bulk download the demofile json output from the events command otherwise a single event id can be specified to simply download demofiles for that event.

tldr

Getting Started

Events 🎮

events

argument datatype description notes
STARTDATE string the start date from when evet data should be gathered formatted as string 'YYYY-MM-DD' required
ENDDATE string the date to which event data should be gathered formatted as string 'YYYY-MM-DD' required
STORAGEPATH string the directory or filepath to which the resulting json should be stored optional (default is cwd)
MATCHES boolean whether match information and demofile urls should be scraped as well This flag is required if the resulting json file
should be used for the fetch command
optional (True if present)
EVENT TYPE enum Which type of event datashould be pulled (Online, Lan ...) optional (default is online)

The Objects in the resulting json are identified by their event id given as a key and will look something like this:

{
  "6475": {
    "event_data": {
      "entity": "event",
      "event_id": "6475",
      "event_url": "https://www.hltv.org/events/6475/iem-dallas-2022-oceania-open-qualifier-2",
      "event_name_encoded": "iem-dallas-2022-oceania-open-qualifier-2",
      "event_name_full": "IEM Dallas 2022 Oceania Open Qualifier 2",
      "nr_of_teams": "8+",
      "prize": "Other",
      "event_type": "Online",
      "location": "Oceania (Online)",
      "event_start": "2022-04-20",
      "event_end": "2022-04-21"
    },
    "matches": [
      {
        "entity": "match",
        "teams": ["Paradox", "Aftershock"],
        "date_time": "2022-04-21 10:00:00",
        "match_url": "https://www.hltv.org//matches/2355881/paradox-vs-aftershock-iem-dallas-2022-oceania-open-qualifier-2",
        "demo_id": "71497",
        "demo_url": "https://www.hltv.org/download/demo/71497"
      }
    ]
  }

Fetch 💾

fetch

argument datatype description notes
EVENT ID string | int the start date from when evet data should be gathered LOOKUP FILE & EVENT ID are mutually exclusive
only one can be used
required
LOOKUP FILE string the filepath of the by the events command generated lookup that should be sued for demo downloading LOOKUP FILE & EVENT ID are mutually exclusive
only one can be used
required
STORAGEPATH string the directory to which the demofiles should be written optional (default is cwd)
MULTIPROCESSING boolean whether multiprocessing should be utilized to speed up downloading optional (True if present)

Changelog

Version 0.1.3 (2022.09.22)

  • Fixed a bug where the package failed to gather the file name of the provided demo file while using the fetch command

Version 0.1.2 (2022.05.30)

  • Bug fixes and improvements

Version 0.1.1 (2022.04.29)

  • Bug Fixes on multiprocessed downloading

Version 0.1.0 (2022.04.24)

  • Initial release

Contributing

Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

goscrape-0.1.3.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

goscrape-0.1.3-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file goscrape-0.1.3.tar.gz.

File metadata

  • Download URL: goscrape-0.1.3.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for goscrape-0.1.3.tar.gz
Algorithm Hash digest
SHA256 0200ffb046033c614f401575e2d8f65e58b1212e3623ff39f8c7e3cc53cece41
MD5 3e3123175136e6a2bccc9db8ec4ec0d9
BLAKE2b-256 6b61719598fa0beefeb4c8f95c10b64c3782c556861c9e053b6ee6a5971c5c18

See more details on using hashes here.

File details

Details for the file goscrape-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: goscrape-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for goscrape-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 51f4dec61ac712d6e1d91ce3fad3e2a4361e3d30b8d7b798b0d4720098f5a688
MD5 1106e0491f62a17c77a529f27fdf978f
BLAKE2b-256 e90e7f575b356cace4cc3c5a2f01a0075f8ce37333e51f292945574e1172c2cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page