Python web crawler / scraper for WG-Gesucht. Crawls the WG-Gesucht site for new apartment listings and send a message to the poster, based off your saved filters and saved text template.
Project description
Python web crawler / scraper for WG-Gesucht. Crawls the WG-Gesucht site for new apartment listings and send a message to the poster, based off your saved filters and saved text template.
Installation
$ pip install wg-gesucht-crawler-cli
Or, if you have virtualenvwrapper installed:
$ mkvirtualenv wg-gesucht-crawler-cli $ pip install wg-gesucht-crawler-cli
Use
Can be run directly from the command line with:
$ wg-gesucht-crawler-cli --help
Or if you want to use it in your own project:
from wg_gesucht.crawler import WgGesuchtCrawler
Just make sure to save at least one search filter as well as a template text on your wg-gesucht account.
Free software: MIT license
Documentation: https://wg-gesucht-crawler-cli.readthedocs.org.
Features
Searches https://wg-gesucht.de for new WG ads based off your saved filters
Sends your saved template message and applies to all matching listings
Reruns every ~5 minutes
Run on a RPi or free EC2 micro instance 24/7 to always be one of the first to apply for new listings
Getting Caught with reCAPTCHA
I’ve made the crawler sleep for 5-8 seconds between each request to try and avoid their reCAPTCHA, but if the crawler does get caught, you can sign into your wg-gesucht account manually through the browser and solve the reCAPTCHA, then start the crawler again.
If it continues to happen, you can also increase the sleep time in the get_page()
function in wg_gesucht.py
History
Pre-release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for wg-gesucht-crawler-cli-0.1.5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2b21bfac9a23c0ddb1b5688e07b138dbaa487af27f739883c7368cc4e84fe48 |
|
MD5 | c1a89b2ac6e924e848944630d9489346 |
|
BLAKE2b-256 | 4dcd03b0ffe0d2f198ab28129333db5acfc6c0ae2a54e6de74076110c68648d8 |
Hashes for wg_gesucht_crawler_cli-0.1.5-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 189f0b4b5ef16e9e7efb410e97312dad0727f2e3e3e9c092d4aecfd70b0a7716 |
|
MD5 | 83f3cfa1bc2c997f107b1c89de0cf4b2 |
|
BLAKE2b-256 | 87def44489e4c732fc2eae374a0813726c67346e2b83a6d330d9b8eb96c3ad89 |