Real estate scraping library supporting Zillow, Realtor.com & Redfin.
Project description
HomeHarvest is a simple, yet comprehensive, real estate scraping library.
Looking to build a data-focused software product? Book a call to work with us.
Features
- Scrapes properties from Zillow, Realtor.com & Redfin simultaneously
- Aggregates the properties in a Pandas DataFrame
Installation
pip install --force-reinstall homeharvest
Python version >= 3.10 required
Usage
CLI
homeharvest "San Francisco, CA" -s zillow realtor.com redfin -l for_rent -o excel -f HomeHarvest
This will scrape properties from the specified sites for the given location and listing type, and save the results to an Excel file named HomeHarvest.xlsx.
By default:
- If
-sor--site_nameis not provided, it will scrape from all available sites. - If
-lor--listing_typeis left blank, the default isfor_sale. Other options arefor_rentorsold. - The
-oor--outputdefault format isexcel. Options arecsvorexcel. - If
-for--filenameis left blank, the default isHomeHarvest_<current_timestamp>. - If
-por--proxyis not provided, the scraper uses the local IP.
Python
from homeharvest import scrape_property
import pandas as pd
properties: pd.DataFrame = scrape_property(
site_name=["zillow", "realtor.com", "redfin"],
location="85281",
listing_type="for_rent" # for_sale / sold
)
#: Note, to export to CSV or Excel, use properties.to_csv() or properties.to_excel().
print(properties)
Output
>>> properties.head()
property_url site_name listing_type apt_min_price apt_max_price ...
0 https://www.redfin.com/AZ/Tempe/1003-W-Washing... redfin for_rent 1666.0 2750.0 ...
1 https://www.redfin.com/AZ/Tempe/VELA-at-Town-L... redfin for_rent 1665.0 3763.0 ...
2 https://www.redfin.com/AZ/Tempe/Camden-Tempe/a... redfin for_rent 1939.0 3109.0 ...
3 https://www.redfin.com/AZ/Tempe/Emerson-Park/a... redfin for_rent 1185.0 1817.0 ...
4 https://www.redfin.com/AZ/Tempe/Rio-Paradiso-A... redfin for_rent 1470.0 2235.0 ...
[5 rows x 41 columns]
Parameters for scrape_properties()
Required
├── location (str): address in various formats e.g. just zip, full address, city/state, etc.
└── listing_type (enum): for_rent, for_sale, sold
Optional
├── site_name (List[enum], default=all three sites): zillow, realtor.com, redfin
├── proxy (str): in format 'http://user:pass@host:port' or [https, socks]
Property Schema
Property
├── Basic Information:
│ ├── property_url (str)
│ ├── site_name (enum): zillow, redfin, realtor.com
│ ├── listing_type (enum: ListingType)
│ └── property_type (enum): house, apartment, condo, townhouse, single_family, multi_family, building
├── Address Details:
│ ├── street_address (str)
│ ├── city (str)
│ ├── state (str)
│ ├── zip_code (str)
│ ├── unit (str)
│ └── country (str)
├── Property Features:
│ ├── price (int)
│ ├── tax_assessed_value (int)
│ ├── currency (str)
│ ├── square_feet (int)
│ ├── beds (int)
│ ├── baths (float)
│ ├── lot_area_value (float)
│ ├── lot_area_unit (str)
│ ├── stories (int)
│ └── year_built (int)
├── Miscellaneous Details:
│ ├── price_per_sqft (int)
│ ├── mls_id (str)
│ ├── agent_name (str)
│ ├── img_src (str)
│ ├── description (str)
│ ├── status_text (str)
│ ├── latitude (float)
│ ├── longitude (float)
│ └── posted_time (str) [Only for Zillow]
├── Building Details (for property_type: building):
│ ├── bldg_name (str)
│ ├── bldg_unit_count (int)
│ ├── bldg_min_beds (int)
│ ├── bldg_min_baths (float)
│ └── bldg_min_area (int)
└── Apartment Details (for property type: apartment):
├── apt_min_beds: int
├── apt_max_beds: int
├── apt_min_baths: float
├── apt_max_baths: float
├── apt_min_price: int
├── apt_max_price: int
├── apt_min_sqft: int
├── apt_max_sqft: int
Supported Countries for Property Scraping
- Zillow: contains listings in the US & Canada
- Realtor.com: mainly from the US but also has international listings
- Redfin: listings mainly in the US, Canada, & has expanded to some areas in Mexico
Exceptions
The following exceptions may be raised when using HomeHarvest:
InvalidSite- valid options:zillow,redfin,realtor.comInvalidListingType- valid options:for_sale,for_rent,soldNoResultsFound- no properties found from your inputGeoCoordsNotFound- if Zillow scraper is not able to create geo-coordinates from the location you input
Frequently Asked Questions
Q: Encountering issues with your queries?
A: Try a single site and/or broaden the location. If problems persist, submit an issue.
Q: Received a Forbidden 403 response code?
A: This indicates that you have been blocked by the real estate site for sending too many requests. Currently, Zillow is particularly aggressive with blocking. We recommend:
- Waiting a few seconds between requests.
- Trying a VPN to change your IP address.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file homeharvest-0.2.4.tar.gz.
File metadata
- Download URL: homeharvest-0.2.4.tar.gz
- Upload date:
- Size: 17.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bfad88a97f205b0be764d294d1864ae8973ee6e98270168edf15d5ff141f9dfe
|
|
| MD5 |
92a0e5e7a3539dbe70e609cd1062d149
|
|
| BLAKE2b-256 |
43c9f5a60a1b1ec2287f3d72b48875e62f7637e34a509fd4839ec1defe69e0ed
|
File details
Details for the file homeharvest-0.2.4-py3-none-any.whl.
File metadata
- Download URL: homeharvest-0.2.4-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
374822c85bf00d21cef2d7d30bd6249655269ab77272b6b99c61ea98534af25c
|
|
| MD5 |
e05c8d1bfdedcebf6cf975b8e2f9656f
|
|
| BLAKE2b-256 |
30bcb90918ab06849b2ea5e0498b8366865093e9be44e7bc71d5b3d2c4ca138b
|