A tiny spider based on asyncio and requests.
Project description
patul
A tiny spider based on asyncio and requests.
install
pip install patul
usage
Just like scrapy:
import patul
class MySpider(patul.Spider):
start_urls = ['https://cn.bing.com/']
async def parse(self, response):
print(response.xpath('//a/@href').get())
yield patul.Request('https://github.com/financialfly/async-request', callback=self.parse_github)
def parse_github(self, response):
yield {'hello': 'github'}
async def process_result(self, result):
# Process result at here.
print(result)
if __name__ == '__main__':
# Run spider
patul.crawl(MySpider)
For more detailed control (like: handle cookies, download delay, concurrent requests, max retries, logs settings etc.): (refer to the constructor of the Crawler
class):
import patul
class MySpider(patul.Spider):
...
if __name__ == '__main__':
patul.crawl(
MySpider,
handle_cookies=True,
download_delay=0,
concurrent_requests=10,
max_retries=3,
log_settings={'fp': 'spider.log'}
)
test
Use fetch
function to get a response immediately:
from patul import fetch
def parse():
response = fetch('https://www.bing.com')
print(response)
parse()
the output will like this:
<Response 200 https://cn.bing.com/>
The fetch
function also could be use like this:
import patul
def parse(response):
print(response)
yield patul.Request(response.url, callback=parse_next)
def parse_next(response):
print(response.status_code)
yield {'hello': 'world'}
patul.fetch('http://www.baidu.com', callback=parse)
then run the script, you will see the result:
<Response 200 http://www.baidu.com/>
200
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
patul-0.1005.tar.gz
(6.3 kB
view details)
File details
Details for the file patul-0.1005.tar.gz
.
File metadata
- Download URL: patul-0.1005.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.14.2 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2dd126e9f5d26287471ba7172efd9ca3c0b6382dab1be0e484cfe5eac2a2680f |
|
MD5 | 9acac6729cf62208d6826aadf60b723e |
|
BLAKE2b-256 | d6e439f34652dd77e5d3ed629517748f7b0784f2e98a4a62eb823c2a55db8f98 |