Configurable web scraping framework designed to automate data extraction from web pages

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

DRuby977

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Rambot: Versatile Web Scraping Framework

Description

Rambot is a versatile and configurable web scraping framework designed to automate data extraction from web pages. It provides an intuitive structure for:

Managing different scraping modes.
Automating browser navigation.
Handling logs and errors.
Performing advanced HTTP requests to interact with APIs.

Installation

pip install rambot

ChromeDriver Dependency

Rambot uses ChromeDriver for automated browsing. Install it based on your operating system:

Windows: Download ChromeDriver here and add it to your PATH.
macOS: Install via Homebrew:
```
brew install chromedriver
```
Linux: Install via APT:
```
sudo apt install chromium-chromedriver
```

Key Features

1. Mode-Based Execution

Supports multiple scraping modes via ScraperModeManager.
Use @bind decorator or self.mode_manager.register() to associate functions with specific modes.

2. Headless Browser Control

Integrates with botasaurus for automation.
Advanced proxy management, image blocking, and extension loading.
Uses ChromeDriver to navigate and extract content.

3. Optimized Data Handling

Saves extracted data in JSON format.
Reads and processes existing data files as input.
Models structured data using Document.

4. Error Management & Logging

Centralized error handling with ErrorConfig.
Uses loguru for detailed and structured logging.

5. Scraping Throttling & Delays

Introduces randomized delays to mimic human behavior (wait()).
Ensures compliance with website rate limits.

6. Useful Decorators

@errors: Structured error handling.
@no_print: Suppresses unwanted output.
@scrape: Enforces function structure in scraping processes.

Basic Usage

1. Create a Scraper

from rambot.scraper import Scraper, bind
from rambot.scraper.models import Document
import typing

class App(Scraper):
    BASE_URL: str = "https://www.skipthedishes.com"

    @bind(mode="cities")
    def available_cities(self) -> typing.List[Document]:
        self.get("https://www.skipthedishes.com/canada-food-delivery")
        elements = self.find_all("h4 div a")
        return [
            Document(link=self.BASE_URL + href)
            for element in elements
            if (href := element.get_attribute("href"))
        ]

2. Run the Scraper

if __name__ == "__main__":
    app = App()
    app.run()  # Executes the mode registered in launch.json

3. Configure `launch.json` in VSCode

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "cities",
      "type": "python",
      "request": "launch",
      "program": "main.py",
      "justMyCode": false,
      "args": ["--mode", "cities"]
    }
  ]
}

4. Retrieve Results

Extracted data is saved in {mode}.json:

{
  "data": [
    {"link": "https://www.skipthedishes.com/cities/calgary"},
    {"link": "https://www.skipthedishes.com/cities/brandon"},
    {"link": "https://www.skipthedishes.com/cities/welland"}
  ],
  "run_stats": {"status": "success", "message": null}
}

HTTP Request Module

Description

This module allows sending HTTP requests with automatic error handling, logging, and retry attempts.

Example Usage

from module_name import request

response = request(
    method="GET",
    url="http://example.com",
    options={"headers": {"User-Agent": "CustomAgent"}, "timeout": 10},
    max_retry=3,
    retry_wait=2
)

Using Proxies and Custom Headers

response = request(
    method="POST",
    url="http://example.com/api",
    options={
        "proxies": {"http": "http://my-proxy.com:{port}", "https": "http://my-proxy.com:{port}"},
        "json": {"key": "value"},
        "headers": {"Authorization": "Bearer TOKEN"}
    },
    max_retry=5,
    retry_wait=3
)

Usage in a Scraper

from rambot.requests import requests
from rambot.scraper import Scraper, bind
from rambot.models import Document
import typing

class App(Scraper):
    def open(self, wait=True):
        if self.mode in ["cities"]:
            return  # Prevents browser from opening for this mode
        return super().open(wait)

    @bind(mode="cities")
    def cities(self) -> typing.List[Document]:
        response = requests.send(
            method="GET",
            url="https://www.skipthedishes.com/canada-food-delivery",
            options={"timeout": 15},
            max_retry=5,
            retry_wait=1.25
        )
        elements = response.select("h4 div a")
        return [
            Document(link=self.BASE_URL + href)
            for element in elements
            if (href := element.get("href"))
        ]

Advantages

Scraping without a browser: Reduces resource consumption.
Retry mechanism: Minimizes failures.
Fast data extraction: Parses HTML directly with requests.

With Rambot, automate and optimize your data extractions efficiently! 🚀

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

DRuby977

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.6

Jan 10, 2026

0.1.5

Jan 10, 2026

0.1.4

Jan 10, 2026

0.1.3

Jan 9, 2026

0.1.2

Mar 10, 2025

This version

0.1.1

Mar 8, 2025

0.1.0

Mar 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rambot-0.1.1.tar.gz (23.6 kB view details)

Uploaded Mar 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rambot-0.1.1-py3-none-any.whl (26.0 kB view details)

Uploaded Mar 8, 2025 Python 3

File details

Details for the file rambot-0.1.1.tar.gz.

File metadata

Download URL: rambot-0.1.1.tar.gz
Upload date: Mar 8, 2025
Size: 23.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rambot-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`da936df4d474bffcff92aea6095457d5df85c4ab2589e6c4a4eeaee0353beeee`
MD5	`fea1b9d8d351f99a4592c73e7eb044f1`
BLAKE2b-256	`fe35d9f896d106a28bf5ba2bb52d3facffe9239600e545f85e33ce24ba38bfc7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rambot-0.1.1.tar.gz:

Publisher: python-publish.yml on AlexVachon/rambot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rambot-0.1.1.tar.gz
- Subject digest: da936df4d474bffcff92aea6095457d5df85c4ab2589e6c4a4eeaee0353beeee
- Sigstore transparency entry: 178898107
- Sigstore integration time: Mar 8, 2025
Source repository:
- Permalink: AlexVachon/rambot@1a29077ec7c24159a8127aad04799039f213bbba
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/AlexVachon
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@1a29077ec7c24159a8127aad04799039f213bbba
- Trigger Event: release

File details

Details for the file rambot-0.1.1-py3-none-any.whl.

File metadata

Download URL: rambot-0.1.1-py3-none-any.whl
Upload date: Mar 8, 2025
Size: 26.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rambot-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4485be242548b6f7c93b8db5caa6b3335580c9b7ce21205609b098da28f08b36`
MD5	`12ecce58ded20a1f216512a1029af5a3`
BLAKE2b-256	`34106848114cc1dc23275be7895c5c82731d8b26e400af3dac9b89a0b9501954`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rambot-0.1.1-py3-none-any.whl:

Publisher: python-publish.yml on AlexVachon/rambot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rambot-0.1.1-py3-none-any.whl
- Subject digest: 4485be242548b6f7c93b8db5caa6b3335580c9b7ce21205609b098da28f08b36
- Sigstore transparency entry: 178898115
- Sigstore integration time: Mar 8, 2025
Source repository:
- Permalink: AlexVachon/rambot@1a29077ec7c24159a8127aad04799039f213bbba
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/AlexVachon
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@1a29077ec7c24159a8127aad04799039f213bbba
- Trigger Event: release

rambot 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Rambot: Versatile Web Scraping Framework

Description

Installation

ChromeDriver Dependency

Key Features

1. Mode-Based Execution

2. Headless Browser Control

3. Optimized Data Handling

4. Error Management & Logging

5. Scraping Throttling & Delays

6. Useful Decorators

Basic Usage

1. Create a Scraper

2. Run the Scraper

3. Configure launch.json in VSCode

4. Retrieve Results

HTTP Request Module

Description

Example Usage

Using Proxies and Custom Headers

Usage in a Scraper

Advantages

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

3. Configure `launch.json` in VSCode