aio-scrapy

A high-level Web Crawling and Web Scraping framework based on Asyncio

These details have not been verified by PyPI

Project links

Homepage

Project description

aio-scrapy

An asyncio + aiolibs crawler imitate scrapy framework

English | 中文

Overview

aio-scrapy framework is base on opensource project Scrapy & scrapy_redis.
aio-scrapy implements compatibility with scrapyd.
aio-scrapy implements redis queue and rabbitmq queue.
aio-scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
Distributed crawling/scraping.

Requirements

Python 3.9+
Works on Linux, Windows, macOS, BSD

Install

The quick way:

# Install the latest aio-scrapy
pip install git+https://github.com/ConlinH/aio-scrapy

# default
pip install aio-scrapy

# Install all dependencies 
pip install aio-scrapy[all]

# When you need to use mysql/httpx/rabbitmq/mongo
pip install aio-scrapy[aiomysql,httpx,aio-pika,mongo]

Usage

create project spider:

aioscrapy startproject project_quotes

cd project_quotes
aioscrapy genspider quotes

quotes.py

from aioscrapy.spiders import Spider


class QuotesMemorySpider(Spider):
    name = 'QuotesMemorySpider'

    start_urls = ['https://quotes.toscrape.com']

    async def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'author': quote.xpath('span/small/text()').get(),
                'text': quote.css('span.text::text').get(),
            }

        next_page = response.css('li.next a::attr("href")').get()
        if next_page is not None:
            yield response.follow(next_page, self.parse)


if __name__ == '__main__':
    QuotesMemorySpider.start()

run the spider:

aioscrapy crawl quotes

create single script spider:

aioscrapy genspider single_quotes -t single

single_quotes.py:

from aioscrapy.spiders import Spider


class QuotesMemorySpider(Spider):
    name = 'QuotesMemorySpider'
    custom_settings = {
        "USER_AGENT": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
        'CLOSE_SPIDER_ON_IDLE': True,
        # 'DOWNLOAD_DELAY': 3,
        # 'RANDOMIZE_DOWNLOAD_DELAY': True,
        # 'CONCURRENT_REQUESTS': 1,
        # 'LOG_LEVEL': 'INFO'
    }

    start_urls = ['https://quotes.toscrape.com']

    @staticmethod
    async def process_request(request, spider):
        """ request middleware """
        pass

    @staticmethod
    async def process_response(request, response, spider):
        """ response middleware """
        return response

    @staticmethod
    async def process_exception(request, exception, spider):
        """ exception middleware """
        pass

    async def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'author': quote.xpath('span/small/text()').get(),
                'text': quote.css('span.text::text').get(),
            }

        next_page = response.css('li.next a::attr("href")').get()
        if next_page is not None:
            yield response.follow(next_page, self.parse)

    async def process_item(self, item):
        print(item)


if __name__ == '__main__':
    QuotesMemorySpider.start()

run the spider:

aioscrapy runspider quotes.py

more commands:

aioscrapy -h

more example

Documentation

doc

Ready

please submit your sugguestion to owner by issue

Thanks

aiohttp

scrapy

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

2.1.4

Aug 21, 2024

2.1.3

Jul 22, 2024

2.1.2

Jul 18, 2024

2.1.0

Apr 17, 2024

2.0.10

Apr 3, 2024

2.0.9

Apr 3, 2024

2.0.8

Jan 17, 2024

2.0.7

Jan 16, 2024

2.0.6

Dec 28, 2023

2.0.5

Oct 20, 2023

2.0.4

Oct 12, 2023

2.0.3

Oct 7, 2023

2.0.2

Oct 7, 2023

2.0.1

Sep 27, 2023

2.0.0

Sep 27, 2023

1.3.1

Sep 22, 2023

1.2.17

Jul 31, 2023

1.2.16

Jul 28, 2023

1.2.15

May 30, 2023

1.2.14

May 12, 2023

1.2.13

May 6, 2023

1.2.12

Apr 12, 2023

1.2.11

Apr 8, 2023

1.2.10

Apr 3, 2023

1.2.9

Mar 29, 2023

1.2.8

Mar 28, 2023

1.2.7

Mar 28, 2023

1.2.6

Dec 15, 2022

1.2.5

Nov 23, 2022

1.2.4

Aug 20, 2022

1.2.3

Aug 11, 2022

1.2.1

Jul 29, 2022

1.2.0

Jul 22, 2022

1.1.0

Jul 6, 2022

1.0.2

Jun 30, 2022

1.0.1

Jun 27, 2022

1.0.0

Jun 14, 2022

0.0.4

Apr 22, 2022

0.0.3

Jan 22, 2022

0.0.2

Oct 14, 2021

0.0.1

Oct 13, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aio-scrapy-2.1.4.tar.gz (98.2 kB view details)

Uploaded Aug 21, 2024 Source

Built Distribution

aio_scrapy-2.1.4-py3-none-any.whl (141.4 kB view details)

Uploaded Aug 21, 2024 Python 3

File details

Details for the file aio-scrapy-2.1.4.tar.gz.

File metadata

Download URL: aio-scrapy-2.1.4.tar.gz
Upload date: Aug 21, 2024
Size: 98.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for aio-scrapy-2.1.4.tar.gz
Algorithm	Hash digest
SHA256	`63fb8a13290c687b09b1f116c352801e475667a0b37c85d50e9bf3b97e493af3`
MD5	`4d18d18123764c70b5682154dd1e35d8`
BLAKE2b-256	`3c91f0b67a29997f6250ec2a6db6a9d129f2110ac08e90ec1cfac7229547a8e6`

See more details on using hashes here.

File details

Details for the file aio_scrapy-2.1.4-py3-none-any.whl.

File metadata

Download URL: aio_scrapy-2.1.4-py3-none-any.whl
Upload date: Aug 21, 2024
Size: 141.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for aio_scrapy-2.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`df147146d81315ea887f79b34c499cf200ad3ef6d3afa597ec5b4dbcd1afe24f`
MD5	`c93687d2cdae93e9107e8e184968e44a`
BLAKE2b-256	`cacad6c1880722c7937752e528e7bb7f6d8647288c716cf248ed8718bbd2468e`