No project description provided
Project description
HLS-scraper
This project is a webscraper for the Historical Dictionary of Switzerland (HDS).
Installation
pip install hlsscraper
Usage
Please use the already scraped hls_base.csv from 12.12.2022 as basis so only updates and new records need to be fetched. This will help not to stress HLS servers to much.
import hlsscraper
hlsscraper.scrape(
base_csv_path=f"{os.getcwd()}/data/hls_base.csv",
update_csv_path=f"{os.getcwd()}/data/hls_updates.csv",
new_csv_path=f"{os.getcwd()}/data/hls_new.csv",
last_scraping="12.12.2022",
crawl_delay=20, # as per https://hls-dhs-dss.ch/robots.txt
)
Development
# download a fresh python 3.9
conda create -n py39 python=3.9
# create a .venv inside the project and link against the Python 3.9 version installed through conda
poetry env use ~/.conda/envs/py39/bin/python
# install required packages defined in pyproject.toml into .venv
poetry install
# set up git hooks for autoformatting and linting (black, isort8, flake8) --> .pre-commit-config.yaml
pre-commit install
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hlsscraper-0.0.1.tar.gz
(7.6 kB
view hashes)
Built Distribution
hlsscraper-0.0.1-py3-none-any.whl
(10.4 kB
view hashes)
Close
Hashes for hlsscraper-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c97ff07970cfceb1aff08311e01b2726b5f40563567bfa5adbd26a31f21db43 |
|
MD5 | 4dab1a215c6d5e7560090be1da0330af |
|
BLAKE2b-256 | 7ef029455a3460b2e33ce7b0ff77ae7853367875c57574522978f220786042ae |