Swarmauri web-content extraction tool for scraping HTML pages with CSS selectors.
Project description
Swarmauri Tool Web Scraping
swarmauri_tool_webscraping is a Swarmauri web-content extraction tool that
fetches a page with httpx, parses HTML with BeautifulSoup, and extracts
text using a CSS selector. It is useful for headline capture, policy checks,
lightweight data extraction, and agent workflows that need webpage content on
demand.
Why Use Swarmauri Tool Web Scraping
- Extract targeted text from webpages using CSS selectors.
- Add lightweight HTML scraping to Swarmauri agents and automation flows.
- Pull site copy, headlines, notices, or metadata for downstream analysis.
- Return structured extraction or error results without custom scraping glue.
FAQ
What inputs does the tool expect?
Aurlstring and a CSSselectorstring.
What does the tool return?
Either{"extracted_text": ...}or{"error": ...}.
What happens when no elements match?
The tool returns an emptyextracted_textstring.
Does it render JavaScript-driven pages?
No. It only fetches raw HTTP content and parses returned HTML.
Features
- Swarmauri
ToolBaseimplementation registered asWebScrapingTool. - Uses standard CSS selectors to target page elements.
- Returns joined text content across all selector matches.
- Handles request and parsing failures with structured error output.
- Supports Python 3.10, 3.11, 3.12, 3.13, and 3.14.
Installation
uv add swarmauri_tool_webscraping
pip install swarmauri_tool_webscraping
Usage
from swarmauri_tool_webscraping import WebScrapingTool
tool = WebScrapingTool()
result = tool(url="https://example.com", selector="h1")
print(result)
Examples
Extract a page headline
from swarmauri_tool_webscraping import WebScrapingTool
tool = WebScrapingTool()
result = tool("https://example.com", "h1")
print(result.get("extracted_text"))
Inspect a status banner
from swarmauri_tool_webscraping import WebScrapingTool
tool = WebScrapingTool()
result = tool("https://status.example.com", ".banner")
if "error" not in result:
print(result["extracted_text"])
Register the tool in a Swarmauri collection
from swarmauri_standard.tools.ToolCollection import ToolCollection
from swarmauri_tool_webscraping import WebScrapingTool
tools = ToolCollection(tools=[WebScrapingTool()])
print(tools)
Related Packages
Swarmauri Foundations
More Documentation
Best Practices
- Respect site terms, rate limits, and robots rules before scraping.
- Use stable selectors and expect sites to change their markup over time.
- Prefer dedicated APIs when a provider offers one.
- Extend the tool if you need headers, retries, or authenticated requests.
License
This project is licensed under the Apache-2.0 License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swarmauri_tool_webscraping-0.11.0.dev1.tar.gz.
File metadata
- Download URL: swarmauri_tool_webscraping-0.11.0.dev1.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e67f2a8eb001874bc83cf82369a18791410a54b99614baa727d99afce18d9d1e
|
|
| MD5 |
25534b61b781d20b0647885b27d626a5
|
|
| BLAKE2b-256 |
b6c550967dd32af694f73cdfd61815a9e970efa734f36ae308376ebc0b99f737
|
File details
Details for the file swarmauri_tool_webscraping-0.11.0.dev1-py3-none-any.whl.
File metadata
- Download URL: swarmauri_tool_webscraping-0.11.0.dev1-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36dc9d5f7f09400201ec991f425cc80d6c3f41990a1499c89478229c6fa1f41d
|
|
| MD5 |
dc30b19f58af0ccb7f6ccd7209c694a8
|
|
| BLAKE2b-256 |
103c7381a8090c63a2380f2e4fecf8c744af0364725ad6b43e9311ae72f2618f
|