Skip to main content

estela_requests is a request wrapper for estela.

Project description

Estela Requests

Introduction

Estela Requests is a Python library that provides enhanced functionality for making HTTP requests and seamlessly integrates with estela, an open-source project that implements a platform for running spiders in-house or in the cloud, you can learn more about it here. This documentation provides a comprehensive overview of the Estela Requests library, installation instructions, and useful usage examples.

Table of Contents

Installation

To install Estela Requests, you can use pip, the Python package manager. Open your terminal or command prompt and run the following command:

pip install estela-requests@git+https://github.com/bitmakerla/estela-requests.git

Also, you can clone the repository and install from there, running the following command:

git clone git@github.com:bitmakerla/estela-requests.git
cd estela-requests
pip install -e .

Usage

Basic Usage

Here's an example of how to use Estela Requests to scrape the site http://quotes.toscrape.com and send items to Estela:

from bs4 import BeautifulSoup

from estela_requests import EstelaRequests
from estela_requests.estela_hub import EstelaHub
from urllib.parse import urljoin

with EstelaRequests.from_estela_hub(EstelaHub.create_from_settings()) as requests:
    spider_name = "quotes_toscrape"
    # Send a GET request to the website
    def parse_quotes(url):
        response = requests.get(url)
        # Parse the HTML content using BeautifulSoup
        soup = BeautifulSoup(response.content, "html.parser")

        # Extract the desired information from the parsed HTML
        quotes = []
        for quote in soup.find_all("div", class_="quote"):
            text = quote.find("span", class_="text").text
            author = quote.find("small", class_="author").text
            tags = [tag.text for tag in quote.find_all("a", class_="tag")]
            quotes.append({"text": text, "author": author, "tags": tags})

        # Print the extracted information
        for quote in quotes:
            item = {
                "quote": quote["text"],
                "author": quote["author"],
                "tags": ','.join(quote["tags"]),
            }
            requests.send_item(item)
        try:
            next = soup.find("li", class_="next").find("a").get("href")
        except AttributeError:
            next = None
        if next:
            parse_quotes(urljoin(url, next))
    
    if __name__ == "__main__":
        parse_quotes("http://quotes.toscrape.com/")

First we need to import the EstelaRequests and EstelaHub classes:

from estela_requests import EstelaRequests
from estela_requests.estela_hub import EstelaHub

Once imported, you can create a EstelaRequests context manager:

with EstelaRequests.from_estela_hub(EstelaHub.create_from_settings()) as requests:

To assign a name for the spider in estela you should declare the spider_name with the desired name, e.g.

spider_name = "quotes_toscrape"

Finally if you want to yield items you should use send_item method:

requests.send_item(item)

Extend Estela Requests (BETA)

Estela Requests can be easily customized by creating a settings.py file in the directory where you run your code:

import logging

from estela_requests.request_interfaces import RequestsInterface
from estela_queue_adapter.get_interface import get_producer_interface
from estela_requests.middlewares.requests_history import RequestsHistoryMiddleware
from estela_requests.middlewares.spider_status import SpiderStatusMiddleware
from estela_requests.middlewares.stats import StatsMiddleware
from estela_requests.log_helpers.handlers import KafkaLogHandler
from estela_requests.item_pipeline.exporter import KafkaItemExporter, StdoutItemExporter

ESTELA_PRODUCER = get_producer_interface()  # ESTELA_PRODUCER is a queue producer(e.g. kafka producer) that will be used to communicate estela-requests 
ESTELA_PRODUCER.get_connection()            # with estela
HTTP_CLIENT = RequestsInterface()           # HTTP Requests interface that will be used, at the moment we just have RequestsInterface(requests library)
ESTELA_API_HOST = ""                        # This code will be set by estela, you shouldn't move it at least you want to test things
ESTELA_SPIDER_JOB = ""                      # Same as above
ESTELA_SPIDER_ARGS = ""                     # Same as above, at the moment estela-requests doesn't support arguments.
ESTELA_ITEM_PIPELINES = []                  # Item Pipelines to use, i.e. a DateItemPipeline that will add the timestamp to the item.
                                            # Check ItemPipelineInterface to create a new item pipeline.
ESTELA_ITEM_EXPORTERS = [KafkaItemExporter] # Item Exporter to use. Where to export, send the data. Check ItemExporterInterface to create a new                                                     # exporter.
ESTELA_LOG_LEVEL = logging.DEBUG            # Logging Level
ESTELA_LOG_FLAG = 'kafka'                   # This will be removed in future releases.
ESTELA_NOISY_LIBRARIES = []                 # A list of noisy library that you want to turn off.
ESTELA_MIDDLEWARES = [RequestsHistoryMiddleware, StatsMiddleware, SpiderStatusMiddleware]   # Middlewares to use, check MiddlewareInterface to create a new one. 
JOB_STATS_TOPIC = "job_stats"               # Topic name for job stats. 
JOB_ITEMS_TOPIC = "job_items"               # Topic name for job items.
JOB_REQUESTS_TOPIC = "job_requests"         # Topic name for job requests.
JOB_LOGS_TOPIC = "job_logs"                 # Topic name for job logs

More

For more details and information about the Estela project, please refer to the Estela documentation. The documentation provides comprehensive information about the project and its functionalities.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

estela_requests-1.0.1.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

estela_requests-1.0.1-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file estela_requests-1.0.1.tar.gz.

File metadata

  • Download URL: estela_requests-1.0.1.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for estela_requests-1.0.1.tar.gz
Algorithm Hash digest
SHA256 07e1a21d62449e59c0a0fcb018c9174c295c82a67dc7b4ebb238a141fffabd65
MD5 fb33b8f1bde0823d5d166f66cf3bb069
BLAKE2b-256 f5fd823b93dac62f0050300708729345117f4e1d8bc68fe50bc5e460c1861e8d

See more details on using hashes here.

File details

Details for the file estela_requests-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for estela_requests-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d5db39d186da97b3556947bf06babd212c6402be00c46290dc5cf9585bbc9693
MD5 fcbb6f6ce659175c768caf35ba6e1eae
BLAKE2b-256 b3a5e270c325e285f520e3370f081b9a64de00f43eecaa616bf28081e2557fb5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page