Skip to main content

Real estate scraping library supporting Zillow, Realtor.com & Redfin.

Project description

HomeHarvest is a simple, yet comprehensive, real estate scraping library.

Try with Replit

Looking to build a data-focused software product? Book a call to work with us.

Features

  • Scrapes properties from Zillow, Realtor.com & Redfin simultaneously
  • Aggregates the properties in a Pandas DataFrame

homeharvest

Installation

pip install --upgrade homeharvest

Python version >= 3.10 required

Usage

from homeharvest import scrape_property
import pandas as pd

properties: pd.DataFrame = scrape_property(
    site_name=["zillow", "realtor.com", "redfin"],
    location="85281",
    listing_type="for_rent" # for_sale / sold
)

#: Note, to export to CSV or Excel, use properties.to_csv() or properties.to_excel().
print(properties)

Output

>>> properties.head()
                           street   city  ... mls_id description
0                 420 N  Scottsdale Rd  Tempe  ...    NaN         NaN
1                1255 E  University Dr  Tempe  ...    NaN         NaN
2              1979 E  Rio Salado Pkwy  Tempe  ...    NaN         NaN
3                      548 S Wilson St  Tempe  ...   None        None
4  945 E  Playa Del Norte Dr Unit 4027  Tempe  ...    NaN         NaN
[5 rows x 23 columns]

Parameters for scrape_properties()

Required
├── location (str): address in various formats e.g. just zip, full address, city/state, etc.
└── listing_type (enum): for_rent, for_sale, sold
Optional
├── site_name (List[enum], default=all three sites): zillow, realtor.com, redfin

Property Schema

Property
├── Basic Information:
│   ├── property_url (str)
│   ├── site_name (enum): zillow, redfin, realtor.com
│   ├── listing_type (enum: ListingType)
│   └── property_type (enum): house, apartment, condo, townhouse, single_family, multi_family, building

├── Address Details:
│   ├── street_address (str)
│   ├── city (str)
│   ├── state (str)
│   ├── zip_code (str)
│   ├── unit (str)
│   └── country (str)

├── Property Features:
│   ├── price (int)
│   ├── tax_assessed_value (int)
│   ├── currency (str)
│   ├── square_feet (int)
│   ├── beds (int)
│   ├── baths (float)
│   ├── lot_area_value (float)
│   ├── lot_area_unit (str)
│   ├── stories (int)
│   └── year_built (int)

├── Miscellaneous Details:
│   ├── price_per_sqft (int)
│   ├── mls_id (str)
│   ├── agent_name (str)
│   ├── img_src (str)
│   ├── description (str)
│   ├── status_text (str)
│   ├── latitude (float)
│   ├── longitude (float)
│   └── posted_time (str) [Only for Zillow]

├── Building Details (for property_type: building):
│   ├── bldg_name (str)
│   ├── bldg_unit_count (int)
│   ├── bldg_min_beds (int)
│   ├── bldg_min_baths (float)
│   └── bldg_min_area (int)

└── Apartment Details (for property type: apartment):
    └── apt_min_price (int)

Supported Countries for Property Scraping

  • Zillow: contains listings in the US & Canada
  • Realtor.com: mainly from the US but also has international listings
  • Redfin: listings mainly in the US, Canada, & has expanded to some areas in Mexico

Exceptions

The following exceptions may be raised when using HomeHarvest:

  • InvalidSite - valid options: zillow, redfin, realtor.com
  • InvalidListingType - valid options: for_sale, for_rent, sold
  • NoResultsFound - no properties found from your input
  • GeoCoordsNotFound - if Zillow scraper is not able to create geo-coordinates from the location you input

Frequently Asked Questions


Q: Encountering issues with your queries?
A: Try a single site and/or broaden the location. If problems persist, submit an issue.


Q: Received a Forbidden 403 response code?
A: This indicates that you have been blocked by the real estate site for sending too many requests. Currently, Zillow is particularly aggressive with blocking. We recommend:

  • Waiting a few seconds between requests.
  • Trying a VPN to change your IP address.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

homeharvest-0.2.0.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

homeharvest-0.2.0-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file homeharvest-0.2.0.tar.gz.

File metadata

  • Download URL: homeharvest-0.2.0.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for homeharvest-0.2.0.tar.gz
Algorithm Hash digest
SHA256 726a3656d6d4457037c10fac421e6f482909eb94bbc6d21fad9cd24a963ad67f
MD5 5b8d298b874b8aef35a02dfdb4986096
BLAKE2b-256 899dbfabfc7ccfacc402768cbe21058d8d02307b8945c8271bd94aae7f0647aa

See more details on using hashes here.

File details

Details for the file homeharvest-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: homeharvest-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for homeharvest-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a1c24255820499a575b1679e9aca8281ee0c2ea62201e34aee487f03b7e46b23
MD5 476b2b878e7e8ad3cee383f25b43d0a0
BLAKE2b-256 b51d980b6aa6b405581385cc4f2ec429b541371682474e070553711aa39aa208

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page