Skip to main content

Web Scraping Tool for Swarmauri

Project description

Swarmauri Logo

PyPI - Downloads Hits PyPI - Python Version PyPI - License PyPI - swarmauri_tool_webscraping


Swarmauri Tool · Web Scraping

A Swarmauri-compatible scraper that fetches HTML with requests, parses it via BeautifulSoup, and extracts content with CSS selectors. Ideal for lightweight data collection, compliance checks, or enriching agent answers with live webpage snippets.

  • Accepts any valid URL and CSS selector; returns joined text content from the matching nodes.
  • Handles HTTP/network failures gracefully by surfacing structured error messages.
  • Integrates with Swarmauri agents so scraping can be triggered through natural-language prompts.

Requirements

  • Python 3.10 – 3.13.
  • requests and beautifulsoup4 (installed automatically with the package).
  • Respect site terms of service, robots.txt directives, and rate limits when scraping.

Installation

Use your preferred packaging workflow—each command installs the dependencies above.

pip

pip install swarmauri_tool_webscraping

Poetry

poetry add swarmauri_tool_webscraping

uv

# Add to the current project and update uv.lock
uv add swarmauri_tool_webscraping

# or install into the active environment without editing pyproject.toml
uv pip install swarmauri_tool_webscraping

Tip: In containerized or restricted environments ensure outbound HTTPS traffic is permitted; requests needs network access to reach target sites.

Quick Start

from swarmauri_tool_webscraping import WebScrapingTool

scraper = WebScrapingTool()
result = scraper(url="https://example.com", selector="h1")

if "extracted_text" in result:
    print(result["extracted_text"])
else:
    print(result["error"])

extracted_text concatenates matches separated by newlines. When no elements match the selector, the tool returns an empty string.

Usage Scenarios

Monitor Site Copy for Compliance

from swarmauri_tool_webscraping import WebScrapingTool

scraper = WebScrapingTool()
result = scraper(
    url="https://status.vendor.com",
    selector=".uptime-banner"
)

if "error" in result:
    raise RuntimeError(result["error"])

if "maintenance" in result["extracted_text"].lower():
    print("Maintenance notice detected – alert the ops team.")

Inject Live Data Into a Swarmauri Agent Response

from swarmauri_core.agent.Agent import Agent
from swarmauri_core.messages.HumanMessage import HumanMessage
from swarmauri_standard.tools.registry import ToolRegistry
from swarmauri_tool_webscraping import WebScrapingTool

registry = ToolRegistry()
registry.register(WebScrapingTool())
agent = Agent(tool_registry=registry)

message = HumanMessage(content="Check the headline on https://example.com")
response = agent.run(message)
print(response)

Batch Collect Headlines From Multiple Pages

from swarmauri_tool_webscraping import WebScrapingTool

scraper = WebScrapingTool()
urls = [
    "https://news.example.com/tech",
    "https://news.example.com/business",
]

for url in urls:
    result = scraper(url=url, selector="h2.article-title")
    print(url)
    print(result.get("extracted_text", result.get("error")))
    print("---")

Troubleshooting

  • Request error – Network failures, DNS issues, or HTTP 4xx/5xx responses produce Request error messages. Verify connectivity, headers, or authentication if required by the site.
  • Empty extracted_text – The selector may not match any nodes. Use browser dev tools to confirm the CSS selector or adjust the parser to target the correct element.
  • SSL certificate problems – Pass verify=False by forking/extending the tool only when you trust the target; otherwise update CA certificates on the host.

License

swarmauri_tool_webscraping is released under the Apache 2.0 License. See LICENSE for full details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swarmauri_tool_webscraping-0.9.0.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swarmauri_tool_webscraping-0.9.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file swarmauri_tool_webscraping-0.9.0.tar.gz.

File metadata

  • Download URL: swarmauri_tool_webscraping-0.9.0.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_tool_webscraping-0.9.0.tar.gz
Algorithm Hash digest
SHA256 8b0a27db981fa98ac81c0ede9de605c6dc61c59514c94e3d89f6df3547f26175
MD5 728edbf5960f7f4c4d3d943dc43bd5b7
BLAKE2b-256 f277b295f7606eba5b00af27d97a6af64b40ab9a6aefce40b4937e25e45a9bb1

See more details on using hashes here.

File details

Details for the file swarmauri_tool_webscraping-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: swarmauri_tool_webscraping-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_tool_webscraping-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fcb0738ab21c29a284c06ed533a3eddc33f87fce51e897a56a088c38169c605e
MD5 57d8d02b4246020582fcfb7158595979
BLAKE2b-256 a8973890030246449e96e0edcb408a62ca6ac0949f062da4a1440e797e1ea435

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page