Skip to main content

A unified collection of web data premises eg Apartment for sale, apartment for rent, house for sale, house for rent

Project description

A comprehensive web scraping pipeline for extracting and storing real estate data

Purpose of the package

  • The primary objective of this package is to provide an efficient solution for web scraping tasks. It has essentially functionalities including link extraction, data extraction, data cleaning, and data storage to database.

Features

- Date                      - Bathrooms  
- Build Year                - Car Parking
- Floors                    - Ancillary
- Sitting Rooms             - LandSize
- Dining Rooms              - Price
- Bedrooms                  - District
- Wardrobes                 - Sector

Installation

To install the package, run the following command:

!pip install WebScrapeX

Contribution

Contributions are welcome. If you encounter any bugs or have suggestions for improvements, please let me know at inyangel@yahoo.com. Thanks

Author

License

The package is released under the MIT license. (https://choosealicense.com/licenses/mit/)

Dependencies

The package has the following dependencies:

  • Python Decouple: Used for managing settings and configuration.
  • Python Dotenv: Used for loading environment variables from a .env file

Scraping URLs

The package supports scraping the following types of real estate listings from the Imali.biz website:

Apartment for Sale: https://imali.biz/category/1/125/search?pg=
Apartment for Rent: https://imali.biz/category/0/91/search?pg=
House for Rent: https://imali.biz/category/0/27/search?pg=
House for Sale: https://imali.biz/category/0/24/search?pg=

Usage example

Here's an example of how to use the WebScrapeX package to scrape, clean, and save real estate data:

from WebScrapeX import scrape_clean_save_data
import os 

env_path = os.path.abspath('.env')
# Specify the link of the real estate type to scrape
url = "https://imali.biz/category/1/125/search?pg="

# Specify the name of the file to save the data (in lowercase)
file_name = "real_estate_data.csv"

# Scrape, clean, and save the data
scrape_clean_save_data(url, file_name, env_path)
Note: File name should be either "house_sale" or "house_for_rent" or "apartment_for_sale" or "apartment_for_rent".

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webscrapex-0.0.8.tar.gz (6.5 kB view hashes)

Uploaded Source

Built Distribution

webscrapex-0.0.8-py3-none-any.whl (6.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page