Project description

HLS-scraper

This project is a webscraper for the Historical Dictionary of Switzerland (HDS).

Installation

pip install hlsscraper

Usage

Please use the already scraped hls_base.csv from 12.12.2022 as basis so only updates and new records need to be fetched. This will help not to stress HLS servers to much.

import hlsscraper

hlsscraper.scrape(
    base_csv_path=f"{os.getcwd()}/data/hls_base.csv",
    update_csv_path=f"{os.getcwd()}/data/hls_updates.csv",
    new_csv_path=f"{os.getcwd()}/data/hls_new.csv",
    last_scraping="12.12.2022",
    crawl_delay=20,  # as per https://hls-dhs-dss.ch/robots.txt
)

Development

# download a fresh python 3.9
conda create -n py39 python=3.9
# create a .venv inside the project and link against the Python 3.9 version installed through conda
poetry env use ~/.conda/envs/py39/bin/python
# install required packages defined in pyproject.toml into .venv
poetry install
# set up git hooks for autoformatting and linting (black, isort8, flake8) --> .pre-commit-config.yaml
pre-commit install

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.1

Dec 12, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hlsscraper-0.0.1.tar.gz (7.6 kB view hashes)

Uploaded Dec 12, 2022 Source

Built Distribution

hlsscraper-0.0.1-py3-none-any.whl (10.4 kB view hashes)

Uploaded Dec 12, 2022 Python 3

Hashes for hlsscraper-0.0.1.tar.gz

Hashes for hlsscraper-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`923d4dc9da28e19e9076d60d1680d10cd939ec8dc33887136baf965519a52cee`
MD5	`d91b149ed0de90545f397c7df29d9b77`
BLAKE2b-256	`42611d1192617b19bf1be87d64908a59e4448945159c54a6d457549557e36b29`

Hashes for hlsscraper-0.0.1-py3-none-any.whl

Hashes for hlsscraper-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6c97ff07970cfceb1aff08311e01b2726b5f40563567bfa5adbd26a31f21db43`
MD5	`4dab1a215c6d5e7560090be1da0330af`
BLAKE2b-256	`7ef029455a3460b2e33ce7b0ff77ae7853367875c57574522978f220786042ae`