Skip to main content

Swarmauri web-content extraction tool for scraping HTML pages with CSS selectors.

Project description

Swarmauri Logo

PyPI - Downloads Hits PyPI - Python Version PyPI - License PyPI - swarmauri_tool_webscraping Discord

Swarmauri Tool Web Scraping

swarmauri_tool_webscraping is a Swarmauri web-content extraction tool that fetches a page with httpx, parses HTML with BeautifulSoup, and extracts text using a CSS selector. It is useful for headline capture, policy checks, lightweight data extraction, and agent workflows that need webpage content on demand.

Why Use Swarmauri Tool Web Scraping

  • Extract targeted text from webpages using CSS selectors.
  • Add lightweight HTML scraping to Swarmauri agents and automation flows.
  • Pull site copy, headlines, notices, or metadata for downstream analysis.
  • Return structured extraction or error results without custom scraping glue.

FAQ

What inputs does the tool expect?
A url string and a CSS selector string.

What does the tool return?
Either {"extracted_text": ...} or {"error": ...}.

What happens when no elements match?
The tool returns an empty extracted_text string.

Does it render JavaScript-driven pages?
No. It only fetches raw HTTP content and parses returned HTML.

Features

  • Swarmauri ToolBase implementation registered as WebScrapingTool.
  • Uses standard CSS selectors to target page elements.
  • Returns joined text content across all selector matches.
  • Handles request and parsing failures with structured error output.
  • Supports Python 3.10, 3.11, 3.12, 3.13, and 3.14.

Installation

uv add swarmauri_tool_webscraping
pip install swarmauri_tool_webscraping

Usage

from swarmauri_tool_webscraping import WebScrapingTool

tool = WebScrapingTool()
result = tool(url="https://example.com", selector="h1")

print(result)

Examples

Extract a page headline

from swarmauri_tool_webscraping import WebScrapingTool

tool = WebScrapingTool()
result = tool("https://example.com", "h1")

print(result.get("extracted_text"))

Inspect a status banner

from swarmauri_tool_webscraping import WebScrapingTool

tool = WebScrapingTool()
result = tool("https://status.example.com", ".banner")

if "error" not in result:
    print(result["extracted_text"])

Register the tool in a Swarmauri collection

from swarmauri_standard.tools.ToolCollection import ToolCollection
from swarmauri_tool_webscraping import WebScrapingTool

tools = ToolCollection(tools=[WebScrapingTool()])
print(tools)

Related Packages

Swarmauri Foundations

More Documentation

Best Practices

  • Respect site terms, rate limits, and robots rules before scraping.
  • Use stable selectors and expect sites to change their markup over time.
  • Prefer dedicated APIs when a provider offers one.
  • Extend the tool if you need headers, retries, or authenticated requests.

License

This project is licensed under the Apache-2.0 License.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swarmauri_tool_webscraping-0.11.0.dev1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file swarmauri_tool_webscraping-0.11.0.dev1.tar.gz.

File metadata

  • Download URL: swarmauri_tool_webscraping-0.11.0.dev1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_tool_webscraping-0.11.0.dev1.tar.gz
Algorithm Hash digest
SHA256 e67f2a8eb001874bc83cf82369a18791410a54b99614baa727d99afce18d9d1e
MD5 25534b61b781d20b0647885b27d626a5
BLAKE2b-256 b6c550967dd32af694f73cdfd61815a9e970efa734f36ae308376ebc0b99f737

See more details on using hashes here.

File details

Details for the file swarmauri_tool_webscraping-0.11.0.dev1-py3-none-any.whl.

File metadata

  • Download URL: swarmauri_tool_webscraping-0.11.0.dev1-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_tool_webscraping-0.11.0.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 36dc9d5f7f09400201ec991f425cc80d6c3f41990a1499c89478229c6fa1f41d
MD5 dc30b19f58af0ccb7f6ccd7209c694a8
BLAKE2b-256 103c7381a8090c63a2380f2e4fecf8c744af0364725ad6b43e9311ae72f2618f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page