Skip to main content

Python real estate scraping library

Project description

PY-Realty

This is a library designed to scrape publicly available Real Estate listing information.

Specifically, all this library does is send the bare minimum/near bare minimum requests to the public web servers of supported listing websites. Meaning that is is effectively the same as visiting the website (such as Zillow) manually with a web browser with the exception that with PY-Realty you can avoid loading unnecessary JavaScript and media (images, etc.).

That being said scrapping large number of listings with no delays may result in the source (Zillow, Realtor.com, etc.) blocking your IP address (Zillow can probably figure out you're not manually looking at listings if you're querying thousands a minute.), although this has not been tested.

Status

VERSION: 0.1.0

Early development. The library is still in early development and is missing key features and may not be working entirely as expected.

This library is now somewhat usable (mileage may vary) and can be installed (and built) via pip, see installation instructions below.

Road Map

  • Zillow
    • Zillow Search Query
      • Implementation
      • Manual Testing
      • Unit Testing
    • Zillow Listing Details Parsing
      • Property Sale Listing
        • Implementation
        • Manual Testing
        • Unit Testing
      • Property Rental Listing
        • Implementation
        • Manual Testing
        • Unit Testing
      • Apartment Rental Listing
        • Implementation
        • Manual Testing
        • Unit Testing
  • Realtor.com
    • Realtor.com Sale Search Query
      • Implementation
      • Manual Testing
      • Unit Testing
    • Realtor.com Rental Search Query
      • Implementation
      • Manual Testing
      • Unit Testing
    • Property Sale Listing
      • Implementation
      • Manual Testing
      • Unit Testing
    • Property Rental Listing
      • Implementation
      • Manual Testing
      • Unit Testing
  • LandWatch
    • LandWatch search Query
      • Implementation
      • Manual Testing
      • Unit Testing
    • LandWatch Listing Details Parsing
      • Implementation
      • Manual Testing
      • Unit Testing
  • Apartments.com
    • Apartments.com Search Query
      • Implementation
      • Manual Testing
      • Unit Testing
    • Apartments.com Listing Details Parsing
      • Implementation
      • Manual Testing
      • Unit Testing

Regarding Unit Tests

Due to the nature of the data, it is impractical to have Unit tests check if data is extracted correctly. If testing against a source of Truth, say a listing, that listing may change over time. Additionally even though Zillow and Realtor.com may have similar data, if there is a mismatch it does not necessarily mean that scraping is done incorrectly, could just be a mismatch between the two websites.

Therefore Unit tests will check against unexpected behavior as opposed to extraction correctness. This meaning that they will check against exceptions and check if value types are correct. That being said the Unit Test criteria still needs to ensure reasonable robustness.

Usage

More detailed documentation will follow as development continues. However attributes and functions are well documented withing class and function doc-strings.

Zillow

Inside zillow the Query class is used to construct and send query request to Zillow's public API. Once the Query is configured as desired, the get_response method will return the search results.

The search results can then be scraped via the functions details.crapes_listing , details.scrape_listings, and details.lazy_scrape_listings.

Installation

  1. Clone this repository
  2. Within the repository folder, run pip install .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-realty-0.1.1.tar.gz (73.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_realty-0.1.1-py3-none-any.whl (67.6 kB view details)

Uploaded Python 3

File details

Details for the file py-realty-0.1.1.tar.gz.

File metadata

  • Download URL: py-realty-0.1.1.tar.gz
  • Upload date:
  • Size: 73.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for py-realty-0.1.1.tar.gz
Algorithm Hash digest
SHA256 696ab58d51342202cfb955a3b875d8f3483a9bab32c2973d256c47a0af866c81
MD5 b7f16789c1d1b7678fe18b268524554a
BLAKE2b-256 915b04f0b39dbf95a36f1d66c4edfd28da3d02bd34fda8857f3cc4a453af56be

See more details on using hashes here.

File details

Details for the file py_realty-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: py_realty-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 67.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for py_realty-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4c0af35576fb8a6f878cabd7c7735915d56d0f24c80dc3549a4d824415cbcf4c
MD5 7dae1ef10dba1b1cc1d545d12e2c78ab
BLAKE2b-256 4b8ea3520e9cd4594943ac1e8159f4f648da12ad5c3f2dffe9204d3fdea08744

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page