Trulia Crawler Tool Set
Project description
Welcome to crawl_trulia Documentation
This is a small project provide url route, html parse tools to crawl www.trulia.com.
Usage
A real example:
>>> from crawl_trulia.urlencoder import urlencoder >>> from crawl_trulia.htmlparser import htmlparser >>> from crawlib.spider import spider # install crawlib first # use address, city and zipcode >>> address = "22 Yew Rd" >>> city = "Baltimore" >>> zipcode = "21221" >>> url = urlencoder.by_address_city_and_zipcode(address, city, zipcode) >>> html = spider.get_html(url) >>> house_detail_data = htmlparser.get_house_detail(html) >>> house_detail_data { "features": {}, "public_records": { "AC": "a/c", "basement_type": "improved basement (finished)", "bathroom": 2, "build_year": 1986, "county": "baltimore county", "exterior_walls": "siding (alum/vinyl)", "heating": "heat pump", "lot_size": 7505, "lot_size_unit": "sqft", "partial_bathroom": 1, "roof": "composition shingle", "sqft": 998 } } # usually combination of address and zipcode is enough >>> address = "2004 Birch Rd" >>> zipcode = "21221" >>> url = urlencoder.by_address_and_zipcode(address, zipcode) >>> html = spider.get_html(url) >>> house_detail_data = htmlparser.get_house_detail(html)
Install
crawl_trulia is released on PyPI, so all you need is:
$ pip install crawl_trulia
To upgrade to latest version:
$ pip install --upgrade crawl_trulia
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size crawl_trulia-0.0.4.zip (22.5 kB) | File type Source | Python version None | Upload date | Hashes View |