A unified collection of web data premises eg Apartment for sale, apartment for rent, house for sale, house for rent
Project description
A comprehensive web scraping pipeline for extracting and storing real estate data
Purpose of the package
- The primary objective of this package is to provide an efficient solution for web scraping tasks. It has essentially functionalities including link extraction, data extraction, data cleaning, and data storage to database.
Features
- Date - Bathrooms
- Build Year - Car Parking
- Floors - Ancillary
- Sitting Rooms - LandSize
- Dining Rooms - Price
- Bedrooms - District
- Wardrobes - Sector
Installation
To install the package, run the following command:
!pip install WebScrapeX
Contribution
Contributions are welcome. If you encounter any bugs or have suggestions for improvements, please let me know at inyangel@yahoo.com. Thanks
Author
- This package was developed by Lisa Yvette INYANGE (https://github.com/ILisa250)
License
The package is released under the MIT license. (https://choosealicense.com/licenses/mit/)
Dependencies
The package has the following dependencies:
- Python Decouple: Used for managing settings and configuration.
- Python Dotenv: Used for loading environment variables from a .env file
Scraping URLs
The package supports scraping the following types of real estate listings from the Imali.biz website:
Apartment for Sale: https://imali.biz/category/1/125/search?pg=
Apartment for Rent: https://imali.biz/category/0/91/search?pg=
House for Rent: https://imali.biz/category/0/27/search?pg=
House for Sale: https://imali.biz/category/0/24/search?pg=
Usage example
Here's an example of how to use the WebScrapeX package to scrape, clean, and save real estate data:
from WebScrapeX import scrape_clean_save_data
import os
env_path = os.path.abspath('.env')
# Specify the link of the real estate type to scrape
url = "https://imali.biz/category/1/125/search?pg="
# Specify the name of the file to save the data (in lowercase)
file_name = "real_estate_data.csv"
# Scrape, clean, and save the data
scrape_clean_save_data(url, file_name, env_path)
Note: File name should be either "house_sale" or "house_for_rent" or "apartment_for_sale" or "apartment_for_rent".
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for webscrapex-0.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86c8db039f34c971d7594dcbc97a272b07be1ea289f4c2ac21c15080b0a84601 |
|
MD5 | 5c96122464e31ab2aa30f8483cfed2d7 |
|
BLAKE2b-256 | 16ae6ed4b1b422f9b0bee113345189c214a85f2bad05fa936c1766514d0dfc06 |