Skip to main content

A class for scraping data from rightmove.co.uk

Project description

# rightmove-webscraper

<a href=”http://www.rightmove.co.uk/” target=”_blank”>rightmove.co.uk</a> is one of the UK’s largest property listings websites, hosting thousands of listings of properties for sale and to rent.

<code>rightmove_webscraper.py</code> is a simple Python interface to scrape property listings from the website and prepare them in a Pandas dataframe for analysis.

## Installation

Version 1.1 is available to install via Pip:

<code>pip install -U rightmove-webscraper</code>

## Scraping property listings

  1. Go to <a href=”http://www.rightmove.co.uk/”>rightmove.co.uk</a> and search for whatever region, postcode, city, etc. you are interested in. You can also add any additional filters, e.g. property type, price, number of bedrooms, etc.

<img src = “./docs/images/rightmove_search_screen.PNG”>

  1. Run the search on the rightmove website and copy the URL of the first results page.

  2. Create an instance of the class with the URL as the init argument.

` pythonfrom rightmove_webscraper import RightmoveData

url = “https://www.rightmove.co.uk/property-for-sale/find.html?searchType=SALE&locationIdentifier=REGION%5E94346” rm = RightmoveData(url) `

## What will be scraped?

When a RightmoveData instance is created it automatically scrapes every page of results available from the search URL. However please note that rightmove restricts the total possible number of results pages to 42. Therefore if you perform a search which could theoretically return many thousands of results (e.g. “all rental properties in London”), in practice you are limited to only scraping the first 1050 results (42 pages * 25 listings per page = 1050 total listings). A couple of suggested workarounds to this limitation are:

  • Reduce the search area and perform multiple scrapes, e.g. perform a search for each London borough instead of 1 search for all of London.

  • Add a search filter to shorten the timeframe in which listings were posted, e.g. search for all listings posted in the past 24 hours, and schedule the scrape to run daily.

Finally, note that not every piece of data listed on the rightmove website is scraped, instead it is just a subset of the most useful features, such as price, address, number of bedrooms, listing agent. If there are additional data items you think should be scraped, please submit an issue or even better go find the xml path and submit a pull request with the changes.

## Accessing data

The following instance methods and properties are available to access the scraped data.

Full results as a Pandas.DataFrame

` python rm.get_results.head() `

Average price of all listings scraped

` python rm.average_price `

> ` 1650065.841025641 `

Total number of listings scraped

` python rm.results_count `

> ` 195 `

Summary statistics

By default shows the number of listings and average price grouped by the number of bedrooms:

` python rm.summary() `

Alternatively group the results by any other column from the <code>.get_results</code> DataFrame, for example by postcode:

` python rm.summary(by=”postcode”) `

## Legal

<a href=”https://github.com/toddy86”>@toddy86</a> has pointed out per the terms and conditions <a href=”https://www.rightmove.co.uk/this-site/terms-of-use.html”> here</a> the use of webscrapers is unauthorised by rightmove. So please don’t use this package!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rightmove_webscraper-1.1.2.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

rightmove_webscraper-1.1.2-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file rightmove_webscraper-1.1.2.tar.gz.

File metadata

  • Download URL: rightmove_webscraper-1.1.2.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.7 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.2

File hashes

Hashes for rightmove_webscraper-1.1.2.tar.gz
Algorithm Hash digest
SHA256 087d86c107cd0e2f834daf8772cfb686edd03a036a3846300c16112666eabb1d
MD5 fe993f32535d240db11a498e759cc589
BLAKE2b-256 8e34a4d5b15e1ad57a688cb9603d2f9190e4f18302a8d802355459154b1ff0c3

See more details on using hashes here.

File details

Details for the file rightmove_webscraper-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: rightmove_webscraper-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.7 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.2

File hashes

Hashes for rightmove_webscraper-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6bfc196b0c8b4454f14633f9c5aa14959634f98d2ea7911c5a7a875d5ca42224
MD5 71ada4a2a14eabe9ceaa25483f2749f0
BLAKE2b-256 ef8ab22a897339bc91af15e09db5b0d07a50d6fa27a6685ad211ae557289b196

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page