Skip to main content

Scrape feedback and statistics from the European Commission's 'Have your Say' plattform.

Project description

'Have your Say' scraper

Python version PyPI version GPLv3 license

A small utility to scrape the European Commission's 'Have your Say' plattform (https://ec.europa.eu/info/law/better-regulation/have-your-say). Can scrape an initiative's feedback submissions, attachments of these submissions, and the by country and by category statistics.

Installation

pip3 install hys_scraper

Tested to work with Python 3.9 on a Linux machine and Google Colab notebooks.

Getting started

To get started, you will need the publication id of the initiative you want to scrape. To get this, simply navigate to the initiative on 'Have your Say' and look at the URL - the number at the end is the publication id you will use in the next step. For example, for the AIAct commission adoption initiative, the publication id would be 24212003.

To scrape an initiative the following is sufficient (replace 24212003 with the publication id of the initiative you want to scrape):

python3 -m hys_scraper 24212003

This will create a new folder in your current working directory with the following layout:

24212003_requirements_for_artificial_intelligence/
├── attachments
│   ├── 2488672.pdf
│   ├── 2596917.pdf
│   └── ...
├── attachments.csv
├── categories.csv
├── countries.csv
└── feedbacks.csv

1 directory, 263 files

Advanced usage

The command line interface has a few more arguments. For example instead of having hys_scraper create a folder in the local working directory to save results into, you can also manually specify a target directory.

$ python3 -m hys_scraper -h
Scrape feedback and statistics from the European Commission's 'Have your Say' plattform.

positional arguments:
  PID                   The publication id - what comes after 'p_id=' in the initiative's URL.

optional arguments:
  -h, --help            show this help message and exit
  --dir target_dir, --target_dir target_dir
                        Directory to save the feedback and statistics dataframes to. Defaults to creating a new
                        folder in the current working directory.
  --no_attachments      Whether to skip the download of attachments.
  --sleep_time t        Minimum time between consecutive HTTP requests (in seconds).

Alternatively, you can also access hys_scraper from Python:

from hys_scraper import HYS_Scraper
feedbacks, countries, categories = HYS_Scraper("24212003").scrape()

Similar options are available as for the command line interface, check out help(HYS_Scraper) for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hys_scraper-0.1.33.tar.gz (20.6 kB view details)

Uploaded Source

Built Distribution

hys_scraper-0.1.33-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file hys_scraper-0.1.33.tar.gz.

File metadata

  • Download URL: hys_scraper-0.1.33.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for hys_scraper-0.1.33.tar.gz
Algorithm Hash digest
SHA256 7f624accecf120dc3ec45d8d8e821d35c71ed26dfea4b3c4b958e71a5fff47aa
MD5 901611cf447b55619904fceb39dc7a21
BLAKE2b-256 43fea39e380132cfeed0b52784cad5f02a62f7dddddefdef4d3762badfbb0f33

See more details on using hashes here.

File details

Details for the file hys_scraper-0.1.33-py3-none-any.whl.

File metadata

  • Download URL: hys_scraper-0.1.33-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for hys_scraper-0.1.33-py3-none-any.whl
Algorithm Hash digest
SHA256 3424a9c20aae019fe4b2ae086521660e9905e6786c8bf3fbff9845f1856a73c1
MD5 2564fc5a30886c5c96e235241767eb0b
BLAKE2b-256 c6ec6af921ac0c82bbf04ae999769fe7e4081ae35cc8c3c3a6d9d2170d0a7d96

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page