Silene is an open source web crawler framework built upon Pyppeteer.
Project description
Silene
Silene is an open source web crawler framework built upon Pyppeteer.
Requirements
You must have at least Python 3.7 installed.
Installation
To install the latest release run pip install silene
.
Quickstart guide
Each crawler must subclass the Crawler
class and implement the abstract configure
method. The CrawlerConfiguration
specifies the initial requests to make and other properties of the crawler. Once a request is processed, the appropriate
callback will be invoked. By default, in case of a successful request the
on_response_success
callback will be executed. This is where you can interact with the page content. You can also
specify custom callbacks for your requests.
Below you can find a very simple implementation.
Example code snippet
from silene.crawl_request import CrawlRequest
from silene.crawl_response import CrawlResponse
from silene.crawler import Crawler
from silene.crawler_configuration import CrawlerConfiguration
class MyCrawler(Crawler):
def configure(self) -> CrawlerConfiguration:
return CrawlerConfiguration([CrawlRequest('https://example.com')])
def on_response_success(self, response: CrawlResponse) -> None:
# Do something with the response...
pass
Development instructions
Prerequisite
This project requires Pipenv to be installed.
Create environment
Run pipenv install --dev
to create a new virtual environment and install the necessary packages.
Run tests
Run pytest
in the project root folder.
Run tests with coverage
Run pytest --cov=silene
in the project root folder.
License
The source code of Silene is made available under the Apache License, Version 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.