Trulia Crawler Tool Set
Project description
Welcome to crawl_trulia Documentation
This is a small project provide url route, html parse tools to crawl www.trulia.com.
Quick Links
Usage
A real example:
>>> from crawl_trulia.urlencoder import urlencoder
>>> from crawl_trulia.htmlparser import htmlparser
>>> from crawlib.spider import spider # install crawlib first
# use address, city and zipcode
>>> address = "22 Yew Rd"
>>> city = "Baltimore"
>>> zipcode = "21221"
>>> url = urlencoder.by_address_city_and_zipcode(address, city, zipcode)
>>> html = spider.get_html(url)
>>> house_detail_data = htmlparser.get_house_detail(html)
>>> house_detail_data
{
"features": {},
"public_records": {
"AC": "a/c",
"basement_type": "improved basement (finished)",
"bathroom": 2,
"build_year": 1986,
"county": "baltimore county",
"exterior_walls": "siding (alum/vinyl)",
"heating": "heat pump",
"lot_size": 7505,
"lot_size_unit": "sqft",
"partial_bathroom": 1,
"roof": "composition shingle",
"sqft": 998
}
}
# usually combination of address and zipcode is enough
>>> address = "2004 Birch Rd"
>>> zipcode = "21221"
>>> url = urlencoder.by_address_and_zipcode(address, zipcode)
>>> html = spider.get_html(url)
>>> house_detail_data = htmlparser.get_house_detail(html)
Install
crawl_trulia is released on PyPI, so all you need is:
$ pip install crawl_trulia
To upgrade to latest version:
$ pip install --upgrade crawl_trulia
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
crawl_trulia-0.0.4.zip
(22.5 kB
view details)
File details
Details for the file crawl_trulia-0.0.4.zip
.
File metadata
- Download URL: crawl_trulia-0.0.4.zip
- Upload date:
- Size: 22.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb11b1c974b52fcc6330e543338c6051f534d61e3c891dc4841646696b3f9124 |
|
MD5 | d9ef2488f8c372d8d590bfb604130c28 |
|
BLAKE2b-256 | d009234199a82d99a59c3eea0ba5d644bd7279ca65d5fd144ac3ab47f5ef1ceb |