A dynamic and scalable data pipeline from Airbnbs commercial site to your local system / cloud storage.
Project description
Commercial Scraper
A fully dynamic and scalable data pipeline made in Python dedicated to scraping commercial websites that don't offer API's. Can yield structured and unstructured data, and is able to save data both locally and/or on the cloud via the data processing module.
Currently, the scraper is only built to scrape Airbnb's website, but more websites are in the works to generalise the package.
Installation
Use the package manager pip to install CommercialScraper.
pip install CommercialScraper
Usage
from CommercialScraper.pipeline import AirbnbScraper
import CommercialScraper.data_processing
scraper = AirbnbScraper()
# Returns a dictionary of structured data and a list of image sources for a single product page
product_dict, imgs = scraper.scrape_product_data('https://any/airbnb/product/page', any_ID_you_wish, 'Any Category Label you wish')
# Returns a dataframe of product entries as well as a dictionary of image sources pertaining to each product entry
df, imgs = scraper.scrape_all()
# Saves the dataframe to a csv in your local directory inside a created 'data/' folder.
data_processing.df_to_csv(df, 'any_filename')
# Saves images locally
data_processing.images_to_local(images)
# Saves structured data to sql database
data_processing.df_to_sql(df, table_name, username, password, hostname, port, database)
# Saves structured data to AWS cloud services s3 bucket
data_processing.df_to_s3(df, aws_access_key_id, region_name, aws_secret_access_key, bucket_name, upload_name)
# Saves images to AWS cloud services s3 bucket
data_processing.images_to_s3(source_links, aws_access_key_id,region_name, aws_secret_access_key, bucket_name, upload_name)
Docker Image
This package has been containerised in a docker image where it can be run as an application. Please note that data can only be stored on the cloud by this method, not locally. Docker Image
docker pull docker4ldrich/airbnb-scraper
docker run -it docker4ldrich/airbnb-scraper
Follow the prompts and insert credentials carefully, there won't be a chance to correct any typing errors! It's recommended that you paste credentials in where applicable.
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file CommercialScraper-1.0.0.tar.gz
.
File metadata
- Download URL: CommercialScraper-1.0.0.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20f5f9d9e07655e75348c11eae5b737f686b433b88eec67f35fff63175205e61 |
|
MD5 | 33d940bc5de6e873e98789796460deef |
|
BLAKE2b-256 | 3ce3f5c8b3098e1d50fd6d8c203e0512b550021099018da74c4ca5005263fb42 |
File details
Details for the file CommercialScraper-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: CommercialScraper-1.0.0-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 771f7468eea5752817b42fcd7f7e16fd2072589ee85838d22f000d7cce910390 |
|
MD5 | de718143e8ad75422f64cfec2a841d9b |
|
BLAKE2b-256 | 298f29e11bfc9d0d2b1ea6472ff3401e5b24419231c41150f956a7f9fc7c51ee |