Skip to main content

Scratchpad for scraper development and general utilities.

Project description

cdp-scrapers

Build Status Documentation

Scratchpad for scraper development and general utilities.


Installation

Stable Release: pip install cdp-scrapers
Development Head: pip install git+https://github.com/CouncilDataProject/cdp-scrapers.git

Council Data Project

Council Data Project is an open-source project dedicated to providing journalists, activists, researchers, and all members of each community we serve with the tools they need to stay informed and hold their Council Members accountable.

For more information about Council Data Project, please visit our website.

About

cdp-scrapers is a collection of utilities and in-progress or actively maintained CDP instance event scrapers. The purpose of this library is to help new CDP instance maintainers have a quick plethora of examples for getting started on developing their event scraper functions.

Quick Start

Legistar

General Legistar utility functions.

from cdp_scrapers.legistar_utils import get_legistar_events_for_timespan
from cdp_scrapers.instances import get_seattle_events
from datetime import datetime

# Get all events (and minutes item and voting details)
# for a provided timespan for a legistar client
# Returns List[Dict]
seattle_legistar_events = get_legistar_events_for_timespan(
    client="seattle",
    timezone="America/Los_Angeles",
    start=datetime(2021, 7, 12),
    end=datetime(2021, 7, 14),
)

# Or parse and convert to CDP EventIngestionModel
seattle_cdp_parsed_events = get_seattle_events(
    from_dt=datetime(2021, 7, 12),
    to_dt=datetime(2021, 7, 14),
)

Scrapers

Event Scraper Structure

Our current event scraper structure is as follows. The main function get_events gets all the required data and it calls the get_content_uris function to return the required video data.

If your city uses Legistar and the Legistar data is publicly available.

  • You may be able to reuse our scraper with minimal modifications, such as providing the correct Legistar client ID for your municipality.
  • If the Legistar data returned only does not include the EventVideoPath field for the Session.video_uri data, you will only need to implement get_content_uris.

If your city does not use Legistar.

  • You will need to build your own event scraper.

Example of a completed scraper: cdp_scrapers.instances.seattle.SeattleScraper

For more details about creating a custom scraper for your municipality's Legistar data, please visit here.

If you would like to deploy a CDP instance or would like to use this library as a method for retrieving formatted legislative data, please feel free to contribute a new custom municipality scraper!

Creating a Custom Scraper

If it isn't possible to use our generalized Legistar tooling to write your scraper, you will need to create your own event scraper to proceed with the deployment.

  1. Please see our documentation on the minimum data required for CDP event ingestion to understand what data your scraper should return.

  2. From there, begin with our empty custom scraper function template and fill in your scraper.

  3. After your scraper is completed, you can create a pull request to add your scraper into the cdp-scrapers repo so it can be added into the final repo for your CDP instance.

  4. Our automated action will run your scraper to verify it returns the correct data. If it is successful, you may proceed to the next deployment step. If not, we will automatically share the error message so you can fix the issue and the scraper can be tested again afterwards.

Installation

Stable Release: pip install cdp-scrapers
Development Head: pip install git+https://github.com/CouncilDataProject/cdp-scrapers.git

Documentation

For full package documentation please visit councildataproject.org/cdp-scrapers.

Development

Refer to CONTRIBUTING.md for information related to developing the code.

MPLv2 License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdp-scrapers-0.8.1.tar.gz (112.7 kB view details)

Uploaded Source

Built Distribution

cdp_scrapers-0.8.1-py3-none-any.whl (112.0 kB view details)

Uploaded Python 3

File details

Details for the file cdp-scrapers-0.8.1.tar.gz.

File metadata

  • Download URL: cdp-scrapers-0.8.1.tar.gz
  • Upload date:
  • Size: 112.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for cdp-scrapers-0.8.1.tar.gz
Algorithm Hash digest
SHA256 7fa7654a15183ce86c387122e3c1775265a091bb187de628b1417dc36f7d1cbc
MD5 2f1c886dba87ba7438ea3b07a9129995
BLAKE2b-256 0d1bc3fa011718cc6e407038b957e4b2837f4b8fed2a318f5c65ee24670d4bbd

See more details on using hashes here.

File details

Details for the file cdp_scrapers-0.8.1-py3-none-any.whl.

File metadata

  • Download URL: cdp_scrapers-0.8.1-py3-none-any.whl
  • Upload date:
  • Size: 112.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for cdp_scrapers-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7e7a508e0daba35d273aaa63a70335477028858adf65ebef3128cab34cfe6486
MD5 c2b32111d86ca037a432291a65245701
BLAKE2b-256 f38dc765b24c0bf64b54a27dee3af679dee31105d5a237f3f9b88075361dd3cc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page