Web Scraping Tool for Swarmauri
Project description
Swarmauri Tool · Web Scraping
A Swarmauri-compatible scraper that fetches HTML with requests, parses it via BeautifulSoup, and extracts content with CSS selectors. Ideal for lightweight data collection, compliance checks, or enriching agent answers with live webpage snippets.
- Accepts any valid URL and CSS selector; returns joined text content from the matching nodes.
- Handles HTTP/network failures gracefully by surfacing structured error messages.
- Integrates with Swarmauri agents so scraping can be triggered through natural-language prompts.
Requirements
- Python 3.10 – 3.13.
requestsandbeautifulsoup4(installed automatically with the package).- Respect site terms of service, robots.txt directives, and rate limits when scraping.
Installation
Use your preferred packaging workflow—each command installs the dependencies above.
pip
pip install swarmauri_tool_webscraping
Poetry
poetry add swarmauri_tool_webscraping
uv
# Add to the current project and update uv.lock
uv add swarmauri_tool_webscraping
# or install into the active environment without editing pyproject.toml
uv pip install swarmauri_tool_webscraping
Tip: In containerized or restricted environments ensure outbound HTTPS traffic is permitted;
requestsneeds network access to reach target sites.
Quick Start
from swarmauri_tool_webscraping import WebScrapingTool
scraper = WebScrapingTool()
result = scraper(url="https://example.com", selector="h1")
if "extracted_text" in result:
print(result["extracted_text"])
else:
print(result["error"])
extracted_text concatenates matches separated by newlines. When no elements match the selector, the tool returns an empty string.
Usage Scenarios
Monitor Site Copy for Compliance
from swarmauri_tool_webscraping import WebScrapingTool
scraper = WebScrapingTool()
result = scraper(
url="https://status.vendor.com",
selector=".uptime-banner"
)
if "error" in result:
raise RuntimeError(result["error"])
if "maintenance" in result["extracted_text"].lower():
print("Maintenance notice detected – alert the ops team.")
Inject Live Data Into a Swarmauri Agent Response
from swarmauri_core.agent.Agent import Agent
from swarmauri_core.messages.HumanMessage import HumanMessage
from swarmauri_standard.tools.registry import ToolRegistry
from swarmauri_tool_webscraping import WebScrapingTool
registry = ToolRegistry()
registry.register(WebScrapingTool())
agent = Agent(tool_registry=registry)
message = HumanMessage(content="Check the headline on https://example.com")
response = agent.run(message)
print(response)
Batch Collect Headlines From Multiple Pages
from swarmauri_tool_webscraping import WebScrapingTool
scraper = WebScrapingTool()
urls = [
"https://news.example.com/tech",
"https://news.example.com/business",
]
for url in urls:
result = scraper(url=url, selector="h2.article-title")
print(url)
print(result.get("extracted_text", result.get("error")))
print("---")
Troubleshooting
Request error– Network failures, DNS issues, or HTTP 4xx/5xx responses produceRequest errormessages. Verify connectivity, headers, or authentication if required by the site.- Empty
extracted_text– The selector may not match any nodes. Use browser dev tools to confirm the CSS selector or adjust the parser to target the correct element. - SSL certificate problems – Pass
verify=Falseby forking/extending the tool only when you trust the target; otherwise update CA certificates on the host.
License
swarmauri_tool_webscraping is released under the Apache 2.0 License. See LICENSE for full details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swarmauri_tool_webscraping-0.10.0.dev5.tar.gz.
File metadata
- Download URL: swarmauri_tool_webscraping-0.10.0.dev5.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
123815190aca7a8c32c7c61e362af44c35e35df83e308341a9bfeaa8ee5e627d
|
|
| MD5 |
4662b66cdb6bd5130e72e2eb88db516a
|
|
| BLAKE2b-256 |
bab1eef385feb7ffc458f6e1dc4cc90c0fcb073d7bbaf6e37309e2ba017be6d7
|
File details
Details for the file swarmauri_tool_webscraping-0.10.0.dev5-py3-none-any.whl.
File metadata
- Download URL: swarmauri_tool_webscraping-0.10.0.dev5-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d51584671de1a491e876e5a25b143826861641a1f505de31237e1f615caf8a97
|
|
| MD5 |
8085c51e39f3e1913ab7fef64e253c67
|
|
| BLAKE2b-256 |
803a98df05422b94859595177786f68cf8c474d526827a36c891cddd6b458a9d
|