A universal scraping tool to acquire CS:GO demofiles from professional esports events provided by hltv.org
Project description
GoScrape 🐙: Universal hltv.org demofile scraper
Go scrape is a little open source project I created to make it easy to bulk download demofiles for the FPS CS:GO from the popular CS:GO fansite hltv.org.
Installation in Python - PyPi release
GoScrape is on PyPi, so you can use pip to install it.
pip install goscrape
TL;DR
GoScrape consists of two main commands.
| command | description |
|---|---|
events |
used in the first step to create a json lookup file containing important and structured information about CS:GO esports events in a given timeframe and if specified also links to associated demofiles and matches. |
fetch |
build on top of the events command and can be used to bulk download the demofile json output from the events command otherwise a single event id can be specified to simply download demofiles for that event. |
Getting Started
Events 🎮
| argument | datatype | description | notes | |
|---|---|---|---|---|
| STARTDATE | string | the start date from when evet data should be gathered | formatted as string 'YYYY-MM-DD' | required |
| ENDDATE | string | the date to which event data should be gathered | formatted as string 'YYYY-MM-DD' | required |
| STORAGEPATH | string | the directory or filepath to which the resulting json should be stored | optional (default is cwd) | |
| MATCHES | boolean | whether match information and demofile urls should be scraped as well | This flag is required if the resulting json file should be used for the fetch command |
optional (True if present) |
| EVENT TYPE | enum | Which type of event datashould be pulled (Online, Lan ...) | optional (default is online) |
The Objects in the resulting json are identified by their event id given as a key and will look something like this:
{
"6475": {
"event_data": {
"entity": "event",
"event_id": "6475",
"event_url": "https://www.hltv.org/events/6475/iem-dallas-2022-oceania-open-qualifier-2",
"event_name_encoded": "iem-dallas-2022-oceania-open-qualifier-2",
"event_name_full": "IEM Dallas 2022 Oceania Open Qualifier 2",
"nr_of_teams": "8+",
"prize": "Other",
"event_type": "Online",
"location": "Oceania (Online)",
"event_start": "2022-04-20",
"event_end": "2022-04-21"
},
"matches": [
{
"entity": "match",
"teams": ["Paradox", "Aftershock"],
"date_time": "2022-04-21 10:00:00",
"match_url": "https://www.hltv.org//matches/2355881/paradox-vs-aftershock-iem-dallas-2022-oceania-open-qualifier-2",
"demo_id": "71497",
"demo_url": "https://www.hltv.org/download/demo/71497"
}
]
}
Fetch 💾
| argument | datatype | description | notes | |
|---|---|---|---|---|
| EVENT ID | string | int | the start date from when evet data should be gathered | LOOKUP FILE & EVENT ID are mutually exclusive only one can be used |
required |
| LOOKUP FILE | string | the filepath of the by the events command generated lookup that should be sued for demo downloading | LOOKUP FILE & EVENT ID are mutually exclusive only one can be used |
required |
| STORAGEPATH | string | the directory to which the demofiles should be written | optional (default is cwd) | |
| MULTIPROCESSING | boolean | whether multiprocessing should be utilized to speed up downloading | optional (True if present) |
Changelog
Version 0.1.3 (2022.09.22)
- Fixed a bug where the package failed to gather the file name of the provided demo file while using the fetch command
Version 0.1.2 (2022.05.30)
- Bug fixes and improvements
Version 0.1.1 (2022.04.29)
- Bug Fixes on multiprocessed downloading
Version 0.1.0 (2022.04.24)
- Initial release
Contributing
Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file goscrape-0.1.3.tar.gz.
File metadata
- Download URL: goscrape-0.1.3.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0200ffb046033c614f401575e2d8f65e58b1212e3623ff39f8c7e3cc53cece41
|
|
| MD5 |
3e3123175136e6a2bccc9db8ec4ec0d9
|
|
| BLAKE2b-256 |
6b61719598fa0beefeb4c8f95c10b64c3782c556861c9e053b6ee6a5971c5c18
|
File details
Details for the file goscrape-0.1.3-py3-none-any.whl.
File metadata
- Download URL: goscrape-0.1.3-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51f4dec61ac712d6e1d91ce3fad3e2a4361e3d30b8d7b798b0d4720098f5a688
|
|
| MD5 |
1106e0491f62a17c77a529f27fdf978f
|
|
| BLAKE2b-256 |
e90e7f575b356cace4cc3c5a2f01a0075f8ce37333e51f292945574e1172c2cb
|