No project description provided
Project description
pAsynCrawler
Installation
pip install pAsynCrawler
Features
- Fetch data -
Asynchronously
- Parse data - with
multiprocessing
Demo
Example
from bs4 import BeautifulSoup
from pAsynCrawler import AsynCrawler, flattener
def parser_0(response_text):
soup = BeautifulSoup(response_text)
menus = soup.select('ul > li > span > a')
datas = tuple(x.text for x in menus)
urls = tuple(x.attrs['href'] for x in menus)
return (datas, urls)
def parser_0(response_text):
soup = BeautifulSoup(response_text)
menus = soup.select('ul > li > a')
datas = tuple(x.text for x in menus)
urls = tuple(x.attrs['href'] for x in menus)
return (datas, urls)
if __name__ == '__main__':
ac = AsynCrawler(asy_fetch=20, mp_parse=8)
datas_1, urls_1 = ac.fetch_and_parse(parser_0, ['https://www.example.com'])
datas_2, urls_2 = ac.fetch_and_parse(parser_1, flattener(urls_1))
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pAsynCrawler-0.1.10.tar.gz
(7.2 kB
view hashes)
Built Distribution
Close
Hashes for pasyncrawler-0.1.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 344a4cbecbf2af7c34c6f3e93b3076a8c94d80b27f75bc3163946a3f503d228f |
|
MD5 | c17c830044f3a8414914fda22c34dbb3 |
|
BLAKE2b-256 | 6eeb55471e3b00a3cc4ff43f62c68ee976664e08c75c11f42f1d71497a6a55e0 |