Skip to main content

A library that synchronously scrapes and downloads salary data from [seethroughny.net](https://seethroughny.net/) in a csv format.

Project description

NY Salary Scanner

A library that synchronously scrapes and downloads salary data from seethroughny.net in a csv format.

Example

from ny_salary_scanner import scraper, parser

html_output = "output.html"

scraper.scrape(
        names=["Johnson, Phd Candace S"],
        years = [2025],
        branches = ["Public Authorities"],
        agencies = ["Roswell Park Cancer Institute Corporation"],
        sub_agencies = ["Roswell Park Cancer Institute Corporation"],
        titles = ["President & Ceo"],
        # min_pay = 100, # Not currently supported
        # max_pay = 3000000, # Not currently supported
        sort_by = 'Name',
        timeout = 1000, # ms
        outputHTML = html_output
    )

parser.parse(html_output) # the cwd will contain salaries.csv and output.html after running

Workflow

Playwright opens chrome and searches using the provided parameters. Then, it will click "Load More Results" repeatedly until there is no additional data to load. This step is throttled to be respectful to the website. Now the HTML is saved and downloaded where it will be parsed by BeautifulSoup4 and saved as a CSV.

Technology

  • Playwright
  • BeautifulSoup4
  • Python

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ny_salary_scanner-0.0.1.tar.gz (39.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ny_salary_scanner-0.0.1-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file ny_salary_scanner-0.0.1.tar.gz.

File metadata

  • Download URL: ny_salary_scanner-0.0.1.tar.gz
  • Upload date:
  • Size: 39.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for ny_salary_scanner-0.0.1.tar.gz
Algorithm Hash digest
SHA256 1d5889d86c7925585fc3e9cef3aa7335d66d2fa0d8d5207d95a196700398d539
MD5 66260622273bdefcb03ab567640f44aa
BLAKE2b-256 22b5eba2af5921c146b736c35b644d3a097176b5bcd9af9af7d9b19072d06d7a

See more details on using hashes here.

File details

Details for the file ny_salary_scanner-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ny_salary_scanner-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4e35ee575c151718744bf93b828069987485f2c099b6a5d204165b50c23de9e9
MD5 6a3ab8a8234ae1565e6e972930b41796
BLAKE2b-256 5a1d27eb78bc357b487dfdbbe1174b6e516e49d9d4fb36cf3079edb34e9326ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page