Skip to main content

Python real estate scraping library

Project description

PY-Realty

This is a library designed to scrape publicly available Real Estate listing information.

Specifically, all this library does is send the bare minimum/near bare minimum requests to the public web servers of supported listing websites. Meaning that is is effectively the same as visiting the website (such as Zillow) manually with a web browser with the exception that with PY-Realty you can avoid loading unnecessary JavaScript and media (images, etc.).

That being said scrapping large number of listings with no delays may result in the source (Zillow, Realtor.com, etc.) blocking your IP address (Zillow can probably figure out you're not manually looking at listings if you're querying thousands a minute.), although this has not been tested.

Status

VERSION: 0.1.0

Early development. The library is still in early development and is missing key features and may not be working entirely as expected.

This library is now somewhat usable (mileage may vary) and can be installed (and built) via pip, see installation instructions below.

Road Map

  • Zillow
    • Zillow Search Query
      • Implementation
      • Manual Testing
      • Unit Testing
    • Zillow Listing Details Parsing
      • Property Sale Listing
        • Implementation
        • Manual Testing
        • Unit Testing
      • Property Rental Listing
        • Implementation
        • Manual Testing
        • Unit Testing
      • Apartment Rental Listing
        • Implementation
        • Manual Testing
        • Unit Testing
  • Realtor.com
    • Realtor.com Sale Search Query
      • Implementation
      • Manual Testing
      • Unit Testing
    • Realtor.com Rental Search Query
      • Implementation
      • Manual Testing
      • Unit Testing
    • Property Sale Listing
      • Implementation
      • Manual Testing
      • Unit Testing
    • Property Rental Listing
      • Implementation
      • Manual Testing
      • Unit Testing
  • LandWatch
    • LandWatch search Query
      • Implementation
      • Manual Testing
      • Unit Testing
    • LandWatch Listing Details Parsing
      • Implementation
      • Manual Testing
      • Unit Testing
  • Apartments.com
    • Apartments.com Search Query
      • Implementation
      • Manual Testing
      • Unit Testing
    • Apartments.com Listing Details Parsing
      • Implementation
      • Manual Testing
      • Unit Testing

Regarding Unit Tests

Due to the nature of the data, it is impractical to have Unit tests check if data is extracted correctly. If testing against a source of Truth, say a listing, that listing may change over time. Additionally even though Zillow and Realtor.com may have similar data, if there is a mismatch it does not necessarily mean that scraping is done incorrectly, could just be a mismatch between the two websites.

Therefore Unit tests will check against unexpected behavior as opposed to extraction correctness. This meaning that they will check against exceptions and check if value types are correct. That being said the Unit Test criteria still needs to ensure reasonable robustness.

Usage

More detailed documentation will follow as development continues. However attributes and functions are well documented withing class and function doc-strings.

Zillow

Inside zillow the Query class is used to construct and send query request to Zillow's public API. Once the Query is configured as desired, the get_response method will return the search results.

The search results can then be scraped via the functions details.crapes_listing , details.scrape_listings, and details.lazy_scrape_listings.

Installation

  1. Clone this repository
  2. Within the repository folder, run pip install .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-realty-0.1.0.tar.gz (73.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_realty-0.1.0-py3-none-any.whl (67.6 kB view details)

Uploaded Python 3

File details

Details for the file py-realty-0.1.0.tar.gz.

File metadata

  • Download URL: py-realty-0.1.0.tar.gz
  • Upload date:
  • Size: 73.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for py-realty-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6a0ec2927975fc5482ff6d35fab9d795209b8d9dd0aeed923a8fa711037b3a39
MD5 c92422b1a451676361d8a26cefbeacbe
BLAKE2b-256 98b8231b5ce183d344dff1ea7468694a9466b74213da58e679b17b6bed0ca614

See more details on using hashes here.

File details

Details for the file py_realty-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: py_realty-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 67.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for py_realty-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c0b7fdd7a01034b56549a11c3cfd7d77136c92d978e8f27fe48dd1ace62ad7e2
MD5 c73089b6306bf29ee9729fd3233c6d92
BLAKE2b-256 c8ca47c0e510242a9379560cfa74947ae2df99373d02c8f785624b11489303f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page