No project description provided
Project description
HLS-scraper
This project is a webscraper for the Historical Dictionary of Switzerland (HDS).
Installation
pip install hlsscraper
Usage
Please use the already scraped hls_base.csv from 12.12.2022 as basis so only updates and new records need to be fetched. This will help not to stress HLS servers to much.
import hlsscraper
hlsscraper.scrape(
base_csv_path=f"{os.getcwd()}/data/hls_base.csv",
update_csv_path=f"{os.getcwd()}/data/hls_updates.csv",
new_csv_path=f"{os.getcwd()}/data/hls_new.csv",
last_scraping="12.12.2022",
crawl_delay=20, # as per https://hls-dhs-dss.ch/robots.txt
)
Development
# download a fresh python 3.9
conda create -n py39 python=3.9
# create a .venv inside the project and link against the Python 3.9 version installed through conda
poetry env use ~/.conda/envs/py39/bin/python
# install required packages defined in pyproject.toml into .venv
poetry install
# set up git hooks for autoformatting and linting (black, isort8, flake8) --> .pre-commit-config.yaml
pre-commit install
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hlsscraper-0.0.1.tar.gz
(7.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hlsscraper-0.0.1.tar.gz.
File metadata
- Download URL: hlsscraper-0.0.1.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.9.7 Darwin/21.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
923d4dc9da28e19e9076d60d1680d10cd939ec8dc33887136baf965519a52cee
|
|
| MD5 |
d91b149ed0de90545f397c7df29d9b77
|
|
| BLAKE2b-256 |
42611d1192617b19bf1be87d64908a59e4448945159c54a6d457549557e36b29
|
File details
Details for the file hlsscraper-0.0.1-py3-none-any.whl.
File metadata
- Download URL: hlsscraper-0.0.1-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.9.7 Darwin/21.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c97ff07970cfceb1aff08311e01b2726b5f40563567bfa5adbd26a31f21db43
|
|
| MD5 |
4dab1a215c6d5e7560090be1da0330af
|
|
| BLAKE2b-256 |
7ef029455a3460b2e33ce7b0ff77ae7853367875c57574522978f220786042ae
|