Skip to main content

Real estate scraping library

Project description

HomeHarvest is a real estate scraping library that extracts and formats data in the style of MLS listings.

Not technical? Try out the web scraping tool on our site at tryhomeharvest.com.

Looking to build a data-focused software product? Book a call to work with us.

HomeHarvest Features

  • Source: Fetches properties directly from Realtor.com.
  • Data Format: Structures data to resemble MLS listings.
  • Export Flexibility: Options to save as either CSV or Excel.

Video Guide for HomeHarvest - updated for release v0.3.4

homeharvest

Installation

pip install -U homeharvest

Python version >= 3.9 required

Usage

Python

from homeharvest import scrape_property
from datetime import datetime

# Generate filename based on current timestamp
current_timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"HomeHarvest_{current_timestamp}.csv"

properties = scrape_property(
  location="San Diego, CA",
  listing_type="sold",  # or (for_sale, for_rent, pending)
  property_type='single_family',
  past_days=30,  # sold in last 30 days - listed in last 30 days if (for_sale, for_rent)

  # date_from="2023-05-01", # alternative to past_days
  # date_to="2023-05-28",
  # foreclosure=True
  # mls_only=True,  # only fetch MLS listings
)
print(f"Number of properties: {len(properties)}")

# Export to csv
properties.to_csv(filename, index=False)
print(properties.head())

Output

>>> properties.head()
    MLS       MLS # Status          Style  ...     COEDate LotSFApx PrcSqft Stories
0  SDCA   230018348   SOLD         CONDOS  ...  2023-10-03   290110     803       2
1  SDCA   230016614   SOLD      TOWNHOMES  ...  2023-10-03     None     838       3
2  SDCA   230016367   SOLD         CONDOS  ...  2023-10-03    30056     649       1
3  MRCA  NDP2306335   SOLD  SINGLE_FAMILY  ...  2023-10-03     7519     661       2
4  SDCA   230014532   SOLD         CONDOS  ...  2023-10-03     None     752       1
[5 rows x 22 columns]

Parameters for scrape_property()

Required
├── location (str): The address in various formats - this could be just a zip code, a full address, or city/state, etc.
├── listing_type (option): Choose the type of listing.
    - 'for_rent'
    - 'for_sale'
    - 'sold'
    - 'pending' (for pending/contingent sales)

Optional
├── property_type (list): Choose the type of properties.
    - 'single_family'
    - 'multi_family'
    - 'condos'
    - 'condo_townhome_rowhome_coop'
    - 'condo_townhome'
    - 'townhomes'
    - 'duplex_triplex'
    - 'farm'
    - 'land'
    - 'mobile'

├── radius (decimal): Radius in miles to find comparable properties based on individual addresses.
│    Example: 5.5 (fetches properties within a 5.5-mile radius if location is set to a specific address; otherwise, ignored)
│
├── past_days (integer): Number of past days to filter properties. Utilizes 'last_sold_date' for 'sold' listing types, and 'list_date' for others (for_rent, for_sale).
│    Example: 30 (fetches properties listed/sold in the last 30 days)
│
├── date_from, date_to (string): Start and end dates to filter properties listed or sold, both dates are required.
|    (use this to get properties in chunks as there's a 10k result limit)
│    Format for both must be "YYYY-MM-DD".
│    Example: "2023-05-01", "2023-05-15" (fetches properties listed/sold between these dates)
│
├── mls_only (True/False): If set, fetches only MLS listings (mainly applicable to 'sold' listings)
│
├── foreclosure (True/False): If set, fetches only foreclosures
│
├── proxy (string): In format 'http://user:pass@host:port'
│
├── extra_property_data (True/False): Increases requests by O(n). If set, this fetches additional property data for general searches (e.g. schools, tax appraisals etc.)
│
├── exclude_pending (True/False): If set, excludes 'pending' properties from the 'for_sale' results unless listing_type is 'pending'
│
└── limit (integer): Limit the number of properties to fetch. Max & default is 10000.

Property Schema

Property
├── Basic Information:
│ ├── property_url
│ ├── property_id
│ ├── listing_id
│ ├── mls
│ ├── mls_id
│ └── status

├── Address Details:
│ ├── street
│ ├── unit
│ ├── city
│ ├── state
│ └── zip_code

├── Property Description:
│ ├── style
│ ├── beds
│ ├── full_baths
│ ├── half_baths
│ ├── sqft
│ ├── year_built
│ ├── stories
│ ├── garage
│ └── lot_sqft

├── Property Listing Details:
│ ├── days_on_mls
│ ├── list_price
│ ├── list_price_min
│ ├── list_price_max
│ ├── list_date
│ ├── pending_date
│ ├── sold_price
│ ├── last_sold_date
│ ├── price_per_sqft
│ ├── new_construction
│ └── hoa_fee

├── Location Details:
│ ├── latitude
│ ├── longitude
│ ├── nearby_schools

├── Agent Info:
│ ├── agent_id
│ ├── agent_name
│ ├── agent_email
│ └── agent_phone

├── Broker Info:
│ ├── broker_id
│ └── broker_name

├── Builder Info:
│ ├── builder_id
│ └── builder_name

├── Office Info:
│ ├── office_id
│ ├── office_name
│ ├── office_phones
│ └── office_email

Exceptions

The following exceptions may be raised when using HomeHarvest:

  • InvalidListingType - valid options: for_sale, for_rent, sold, pending.
  • InvalidDate - date_from or date_to is not in the format YYYY-MM-DD.
  • AuthenticationError - Realtor.com token request failed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

homeharvest-0.4.4.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

homeharvest-0.4.4-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file homeharvest-0.4.4.tar.gz.

File metadata

  • Download URL: homeharvest-0.4.4.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for homeharvest-0.4.4.tar.gz
Algorithm Hash digest
SHA256 d4feaf8c448eb371dfdf8b9c2c81cd3fb20c9a97aa4125162577f5b5c96f9649
MD5 64109ea99482afa9b6f0683bcad3588b
BLAKE2b-256 65a28ac4670b86b753447ae4a7a8057a9e0b5ab145da3dd3d533759e6af849f5

See more details on using hashes here.

File details

Details for the file homeharvest-0.4.4-py3-none-any.whl.

File metadata

  • Download URL: homeharvest-0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for homeharvest-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 acdb0220d49edb4a19d411ba6c88fc033930b148e37d871475b8e747062cc36b
MD5 1d09ed2ad0ec0c05dab876e552190d21
BLAKE2b-256 26eb3fe039027c6c496e8a50a35f2aa09f63f1e68f0e96ab06079286807fcf5e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page