Skip to main content

An agenda scraper framework for municipalities

Project description

Engage Scraper

Installation

pip i engage-scraper

About

The Engage Scraper is a standalone library that can be included in any service. The purpose of the scraper is to catalog a municipality's council meeting agendas in a usable format for such things as the engage-client and engage-backend.

To extend this library for your municipality, override the methods of the base class from the scraper_core/ directory and put it in scraper_logics/, prefacing it with your municipality name. For an example see the Santa Monica, CA example in the scraper_logics/ directory. The Santa Monica example makes use of htmlutils.py because it requires HTML scraping for its sources. Feel free to make PRs with new utilities (for example, PDF scraping, RSS scraping, JSON parsing, etc.). The Santa Monica example also uses SQLAlchemy for its models and that is what is preferred for use in the dbutils.py, however you can use anything. ORMs are preferred rather than vanilla psycopg2 or the like.

To use the postgres dbutils.py make sure to set these 5 environment variables (check dev.env and see docker-compose usage below):

  • POSTGRES_HOST optional a host or hostname that is resolvable. Defaults to localhost
  • POSTGRES_USER required
  • POSTGRES_PASSWORD required
  • POSTGRES_PORT optional defaults to 5432
  • POSTGRES_DB required The database used for cataloging your municipality's agendas.

An example of using the Santa Monica scraper library

from engage_scraper.scraper_logics import santamonica_scraper_logic

scraper = santamonica_scraper_logic.SantaMonicaScraper(committee="Santa Monica City Council")
scraper.get_available_agendas()
scraper.scrape()

For SantaMonicaScraper instantiation

For twitter utils used in SantaMonicaScraer

To use the santa monica logic, you must create an App on twitter (will work to make this optional). Following making an app, please use the structure dev.env file to insert the appropriate parameters. But make sure not to make changes to the repository's file. Copy the file up one directory and edit it there. Following the edit, use the docker-compose.yml for testing. You can add examples to examples/ and run them from the script in scripts/ using the docker container.

For the SantaMonicaScraper class the init has these options

  • tz_string="America/Los_Angeles" # defaulted string
  • years=["2019"] # defaulted array of strings of years
  • committee="Santa Monica City Council" # defaulted string of council name

The exposed API methods for scraper are

  • .get_available_agendas() # To get available agendas, no arguments
  • .scrape() # To process agendas and store contents

Feel free to expose more

  • Write wrappers for internal functions if you want to expose them
  • Write extra functions to handle more complex municipality-specific tasks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

engage_scraper-0.0.49.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

engage_scraper-0.0.49-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file engage_scraper-0.0.49.tar.gz.

File metadata

  • Download URL: engage_scraper-0.0.49.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for engage_scraper-0.0.49.tar.gz
Algorithm Hash digest
SHA256 d52e2c4aa654c7be6e63dfb60619b07ac22fd791cd8ae9715c2e25b0e9eec8fc
MD5 c21b149730d7eb38cc644be49d2cf8f5
BLAKE2b-256 eb4a6d526190b1bfe54f7194a3a56609c0ddbfe67f2ae5bffb25467415fea214

See more details on using hashes here.

File details

Details for the file engage_scraper-0.0.49-py3-none-any.whl.

File metadata

  • Download URL: engage_scraper-0.0.49-py3-none-any.whl
  • Upload date:
  • Size: 16.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for engage_scraper-0.0.49-py3-none-any.whl
Algorithm Hash digest
SHA256 d582c1dd9a0986dd87a6d1aec805a5c60fa0c6f5f1d1037a51796d4b288ee6fd
MD5 8a4586ed32e81609472ef745bbc555b5
BLAKE2b-256 d3aab22015aebe187894802be44bcf5a7eaafc08e2104822ae5bba220f0d763d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page