Skip to main content

No project description provided

Project description

HLS-scraper

This project is a webscraper for the Historical Dictionary of Switzerland (HDS).

Installation

pip install hlsscraper

Usage

Please use the already scraped hls_base.csv from 12.12.2022 as basis so only updates and new records need to be fetched. This will help not to stress HLS servers to much.

import hlsscraper

hlsscraper.scrape(
    base_csv_path=f"{os.getcwd()}/data/hls_base.csv",
    update_csv_path=f"{os.getcwd()}/data/hls_updates.csv",
    new_csv_path=f"{os.getcwd()}/data/hls_new.csv",
    last_scraping="12.12.2022",
    crawl_delay=20,  # as per https://hls-dhs-dss.ch/robots.txt
)

Development

# download a fresh python 3.9
conda create -n py39 python=3.9
# create a .venv inside the project and link against the Python 3.9 version installed through conda
poetry env use ~/.conda/envs/py39/bin/python
# install required packages defined in pyproject.toml into .venv
poetry install
# set up git hooks for autoformatting and linting (black, isort8, flake8) --> .pre-commit-config.yaml
pre-commit install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hlsscraper-0.0.1.tar.gz (7.6 kB view hashes)

Uploaded Source

Built Distribution

hlsscraper-0.0.1-py3-none-any.whl (10.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page