Skip to main content

No project description provided

Project description

HLS-scraper

This project is a webscraper for the Historical Dictionary of Switzerland (HDS).

Installation

pip install hlsscraper

Usage

Please use the already scraped hls_base.csv from 12.12.2022 as basis so only updates and new records need to be fetched. This will help not to stress HLS servers to much.

import hlsscraper

hlsscraper.scrape(
    base_csv_path=f"{os.getcwd()}/data/hls_base.csv",
    update_csv_path=f"{os.getcwd()}/data/hls_updates.csv",
    new_csv_path=f"{os.getcwd()}/data/hls_new.csv",
    last_scraping="12.12.2022",
    crawl_delay=20,  # as per https://hls-dhs-dss.ch/robots.txt
)

Development

# download a fresh python 3.9
conda create -n py39 python=3.9
# create a .venv inside the project and link against the Python 3.9 version installed through conda
poetry env use ~/.conda/envs/py39/bin/python
# install required packages defined in pyproject.toml into .venv
poetry install
# set up git hooks for autoformatting and linting (black, isort8, flake8) --> .pre-commit-config.yaml
pre-commit install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hlsscraper-0.0.1.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hlsscraper-0.0.1-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file hlsscraper-0.0.1.tar.gz.

File metadata

  • Download URL: hlsscraper-0.0.1.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.9.7 Darwin/21.5.0

File hashes

Hashes for hlsscraper-0.0.1.tar.gz
Algorithm Hash digest
SHA256 923d4dc9da28e19e9076d60d1680d10cd939ec8dc33887136baf965519a52cee
MD5 d91b149ed0de90545f397c7df29d9b77
BLAKE2b-256 42611d1192617b19bf1be87d64908a59e4448945159c54a6d457549557e36b29

See more details on using hashes here.

File details

Details for the file hlsscraper-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: hlsscraper-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.9.7 Darwin/21.5.0

File hashes

Hashes for hlsscraper-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6c97ff07970cfceb1aff08311e01b2726b5f40563567bfa5adbd26a31f21db43
MD5 4dab1a215c6d5e7560090be1da0330af
BLAKE2b-256 7ef029455a3460b2e33ce7b0ff77ae7853367875c57574522978f220786042ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page