Skip to main content

A tiny spider based on asyncio and requests.

Project description

patul

A tiny spider based on asyncio and requests.

install

pip install patul

usage

Just like scrapy:

import patul


class MySpider(patul.Spider):
    
    start_urls = ['https://cn.bing.com/']
    
    async def parse(self, response):
        print(response.xpath('//a/@href').get())
        yield patul.Request('https://github.com/financialfly/async-request', callback=self.parse_github)

    def parse_github(self, response):
        yield {'hello': 'github'}
    
    async def process_result(self, result):
        # Process result at here.
        print(result)


if __name__ == '__main__':
    # Run spider
    patul.crawl(MySpider)

For more detailed control (like: handle cookies, download delay, concurrent requests, max retries, logs settings etc.): (refer to the constructor of the Crawler class):

import patul

class MySpider(patul.Spider):
    ...

if __name__ == '__main__':
    patul.crawl(
        MySpider, 
        handle_cookies=True, 
        download_delay=0, 
        concurrent_requests=10, 
        max_retries=3, 
        log_settings={'fp': 'spider.log'}
    )

test

Use fetch function to get a response immediately:

from patul import fetch


def parse():
    response = fetch('https://www.bing.com')
    print(response)
    
   
parse()

the output will like this:

<Response 200 https://cn.bing.com/>

The fetch function also could be use like this:

import patul


def parse(response):
    print(response)
    yield patul.Request(response.url, callback=parse_next)


def parse_next(response):
    print(response.status_code)
    yield {'hello': 'world'}


patul.fetch('http://www.baidu.com', callback=parse)

then run the script, you will see the result:

<Response 200 http://www.baidu.com/>
200

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

patul-0.1005.tar.gz (6.3 kB view details)

Uploaded Source

File details

Details for the file patul-0.1005.tar.gz.

File metadata

  • Download URL: patul-0.1005.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.14.2 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for patul-0.1005.tar.gz
Algorithm Hash digest
SHA256 2dd126e9f5d26287471ba7172efd9ca3c0b6382dab1be0e484cfe5eac2a2680f
MD5 9acac6729cf62208d6826aadf60b723e
BLAKE2b-256 d6e439f34652dd77e5d3ed629517748f7b0784f2e98a4a62eb823c2a55db8f98

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page