No project description provided
Project description
Council Meeting Agenda Scraper
Check, download, and parse local council agendas for relevant housing and planning matters.
Users can easily set up notification functionality to be alerted by email (or: to-be-implemented, Discord) when new agendas are released.
This enables YIMBY Melbourne and other organisations to keep easy track of relevant Council activities.
List of functioning scrapers
Melbourne: 13/31
Sydney: 18/30
Scraper details, including links and current status, can be found in the docs (docs/councils.md
)
Write a Scraper! (Instructions)
Setup
Development
-
Setup and activate the Python environment of your choosing.
-
Ensure you have
poetry
installed (e.g. withpip install poetry
). -
Run
poetry shell
to ensure you've activated the correct virtual env. -
Run
poetry install
to install dependencies.
Preferred code formatter is Black.
Testing
poetry run pytest
will run all the tests, including on any new scrapers added to the scrapers/
directory. These tests are also run through GitHub actions upon merge request.
Running the application
Within your environment, run: python council_scrapers/main.py
Logs will print to your terminal and also get saved into /logs/ as well as writing key results to agendas.db
.
You can run an individual scraper by running python council_scrapers/main.py --council council_string
. For instance: python council_scrapers/main.py --council yarra
will run the Yarra Council scraper.
A list of councils and their strings can be found in docs/councils.md
.
.env
Optional functionality you can configure to extend the application's utility.
Email config
In the .env.example
file, there is the basic variable GMAIL_FUNCTIONALITY.
This functionality is turned off by default. If you want to use the email sending features here, then you'll need to include your Gmail authentication details in a .env
file.
This may require setting up an App-specific password, for which you can find setup instructions here.
This functionality is optional, and the app should work fine without this setup.
Discord config
Instructions for setting up Discord can be found in docs/discord.md
.
Writing a scraper
Australia has many, many councils! As such, we need many, many scrapers!
You can find a full list of active scrapers at docs/councils.md
. Additionally, you can find a starting file at docs/scraper_template.py
.
How scrapers work
Scrapers for each council are contained within the scrapers/[state]/
directory.
A scraper should be able to reliably find the most recent agenda on a Council's website. Once that link is found, it is checked against an existing database—if the link is new, then the agenda is downloaded, scanned, and a notification can be sent.
In addition to the link, the scraper function should return an object of the following shape, outlined in base.py
:
@dataclass
class ScraperReturn:
name: str # The name of the meeting (e.g. City Development Delegated Committee).
date: str # The date of the meeting (e.g. 2021-08-01).
time: str # The time of the meeting (e.g. 18:00).
webpage_url: str # The URL of the webpage where the agenda is found.
download_url: str # The URL of the PDF of the agenda.
It is not always possible to scrape the date and time of meetings from Council websites. In these cases, these values should be returned as empty strings.
The scraper
function is then included within a Scraper class, which extends BaseScraper.py
.
Easy scraping
Thanks to the phenomenal work of @catatonicChimp, a lot of the scraping can now be done by extending the BaseScraper class.
1. Duplicate the scraper template
For writing a new scraper, you can refer to and duplicate the template: docs/scraper_template.py
. The Yarra scraper in scrapers/vic/yarra.py
is a good functional straightforward example.
2. Get the agenda page HTML
In the case of most councils, you will will be able to use the self.fetcher.fetch_with_requests(url)
method to return the agenda page html as output.
For more complex Javascript pages, you may need to use self.fetcher.fetch_with_selenium(url)
.
For pages requiring interactivity using a headless browser, you may need to write a Selenium script using the driver returned by self.fetcher.get_selenium_driver()
, and then utilise the Selenium library to navigate the page effectively.
3. Use BeautifulSoup to get the agenda details
Load the HTML into BeautifulSoup like this:
soup = BeautifulSoup(output, 'html.parser')
And then use the BeautifulSoup documentation to navigate the HTML and grab the relevant elements and information.
You may also need to use regular expressions (regexes) to parse dates etc.
Luckily, ChatGPT is quite good at both BeautifulSoup and regexes. So it's recommended that you'll save a great deal of time feeding your HTML into ChatGPT, Github Copilot, or the shockingly reliable Phind.com and iterating like that.
Once you have got the agenda download link and all other available, scrapeable information, return a ScraperReturn object.
4. Add the scraper class to the folder's __init__.py
file
To register the Scraper, import the scraper in the relevant folder's __init__.py
file.
As an example, to add the scraper for the Yarra council, open council_scrapers/scrapers/vic/__init__.py
, and add:
from council_scrapers.scrapers.vic.yarra import YarraScraper
5. Run tests and save the cached page
Once you have your scraper working locally, run pytest in the root directory (council-meeting-agenda-scraper/
) and add the cached results to the commit when successful.
This is done to prevent spamming requests to council pages during the development of scrapers.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file aus_council_scrapers-0.1.0.tar.gz
.
File metadata
- Download URL: aus_council_scrapers-0.1.0.tar.gz
- Upload date:
- Size: 25.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad27d992f11f8c453b81bd144f10b6d62abd61a756a583b5feb1e7433aaddee2 |
|
MD5 | 74d7952e0b938af28369f249946f61e9 |
|
BLAKE2b-256 | 3138831e93de0186ac1ff25606f32f517ea626087de77f195deface0f48ef183 |
File details
Details for the file aus_council_scrapers-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: aus_council_scrapers-0.1.0-py3-none-any.whl
- Upload date:
- Size: 45.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85d723ea5a8c05e036421e7c58328c020f93c18c09bf8a3358221be35b5aa3c8 |
|
MD5 | 47bef2b7ae8da6cf4a4ca78f622e38cf |
|
BLAKE2b-256 | 84bb3cde6da3be1e898c0bb84b88669ade1b3d27645d91a1608d99d7271b36cb |