Project description

pAsynCrawler

Installation

pip install pAsynCrawler

Features

Fetch data - Asynchronously
Parse data - with multiprocessing

Demo

demo code

Example

from bs4 import BeautifulSoup
from pAsynCrawler import AsynCrawler, flattener


def parser_0(response_text):
    soup = BeautifulSoup(response_text)
    menus = soup.select('ul > li > span > a')
    datas = tuple(x.text for x in menus)
    urls = tuple(x.attrs['href'] for x in menus)
    return (datas, urls)


def parser_0(response_text):
    soup = BeautifulSoup(response_text)
    menus = soup.select('ul > li > a')
    datas = tuple(x.text for x in menus)
    urls = tuple(x.attrs['href'] for x in menus)
    return (datas, urls)


if __name__ == '__main__':
    ac = AsynCrawler(asy_fetch=20, mp_parse=8)
    datas_1, urls_1 = ac.fetch_and_parse(parser_0, ['https://www.example.com'])
    datas_2, urls_2 = ac.fetch_and_parse(parser_1, flattener(urls_1))

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.11

Mar 29, 2022

This version

0.1.10

Mar 17, 2022

0.1.9

Mar 16, 2022

0.1.8

Feb 22, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pAsynCrawler-0.1.10.tar.gz (7.2 kB view hashes)

Uploaded Mar 17, 2022 Source

Built Distribution

pasyncrawler-0.1.10-py3-none-any.whl (6.9 kB view hashes)

Uploaded Mar 17, 2022 Python 3

Hashes for pAsynCrawler-0.1.10.tar.gz

Hashes for pAsynCrawler-0.1.10.tar.gz
Algorithm	Hash digest
SHA256	`6404ded8bb1d5d9e04a0a5cc48427db1c1b15d1300ec0d0ece8c2f00b201ddfb`
MD5	`f9d9999e0e2bd6a84cf94ff142e72a04`
BLAKE2b-256	`1e55b0df06e397300badfa153434e53db99f3f4cdd1d5c30115fd494bd1de32d`

Hashes for pasyncrawler-0.1.10-py3-none-any.whl

Hashes for pasyncrawler-0.1.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`344a4cbecbf2af7c34c6f3e93b3076a8c94d80b27f75bc3163946a3f503d228f`
MD5	`c17c830044f3a8414914fda22c34dbb3`
BLAKE2b-256	`6eeb55471e3b00a3cc4ff43f62c68ee976664e08c75c11f42f1d71497a6a55e0`