Treat websites as programmable objects (Wikipedia-Locked Beta)

These details have not been verified by PyPI

Project links

Project description

WebC – Treat Websites as Python Objects

WebC Logo

Version: 0.1.1 Author: Ashwin Prasanth

Overview

webc is a Python library that allows you to treat websites as programmable Python objects.

Instead of manually handling HTTP requests, parsing HTML, and writing repetitive scraping logic, WebC provides a structured, object-oriented interface to access semantic content, query elements, and perform intent-driven tasks.

The goal is simple:

Make web data feel native to Python
Provide meaningful abstractions over raw HTML
Encourage ethical and secure usage by default

⚠️ Developer Preview / Secure Beta

WebC v0.1.1 is a developer preview release intended for testing and feedback.

This version prioritizes security, architecture stability, and controlled usage.

APIs may change during the beta phase.

Installation

Install via pip:

pip install webc

Dependencies

requests
beautifulsoup4

Core Architecture

WebC is organized into four conceptual layers.

1. Resource Layer

Access a webpage as a Resource object:

from webc import web

site = web["https://en.wikipedia.org/wiki/Python_(programming_language)"]

Represents a single webpage
Uses lazy loading (fetches HTML only when needed)
Caches parsed content internally

2. Structure Layer

Provides semantic, high-level content extracted from the page:

site.structure.title
site.structure.links
site.structure.images
site.structure.tables

Image Handling

Extracts from src, srcset, data-src, and <noscript>
Filters UI icons and SVG assets
Resolves relative URLs automatically

Download images:

site.structure.save_images(folder="python_images")

Table Extraction

Detects Wikipedia wikitable tables
Handles rowspan and colspan alignment
Removes citation brackets (e.g., [1])

Save tables as CSV:

site.structure.save_tables(folder="wiki_data")

3. Query Layer

Provides direct DOM access via CSS selectors:

headings = site.query["h1, h2"]

for h in headings:
    print(h.get_text(strip=True))

Returns BeautifulSoup elements
Useful for custom extraction logic
Acts as an advanced access layer

4. Task Layer

Provides intent-driven actions:

summary = site.task.summarize(max_chars=500)
print(summary)

Currently supported:

summarize(max_chars=500)

More tasks will be introduced in future releases.

Security & Usage Policy

This secure beta is intentionally restricted.

Platform Restrictions

Locked to Wikipedia.org only
Only HTTPS URLs are allowed

Built-in Protections

WebC includes safeguards against:

SSRF attacks
Path traversal
Unsafe file writes
Excessive downloads

Requests are controlled and content is cached to prevent unnecessary repeated fetching.

Responsible Use

WebC is designed for:

✔ Educational purposes ✔ Research ✔ Personal automation ✔ Ethical data access

It must not be used for:

Mass scraping
Circumventing website policies
Service disruption
Data abuse

Users are responsible for complying with website Terms of Service.

Full Usage Example

from webc import web

url = "https://en.wikipedia.org/wiki/Python_(programming_language)"
site = web[url]

print("=== STRUCTURE ===")
print(f"Title: {site.structure.title}")
print(f"Total Links: {len(site.structure.links)}")
print(f"First 5 links: {site.structure.links[:5]}")

print("\n--- Downloading Resources ---")
site.structure.save_images(folder="python_images")
site.structure.save_tables(folder="python_data")

print("\n=== QUERY ===")
headings = site.query["h1, h2"]
print(f"Found {len(headings)} headings:")

for h in headings[:3]:
    print(f" - {h.get_text(strip=True)}")

print("\n=== TASK ===")
summary = site.task.summarize(max_chars=500)
print(summary)

Roadmap

Planned future improvements:

Multi-domain support
Advanced rate limiting
Enhanced security layers
Plugin-based task extensions
Dataset export helpers
Cloud-safe scraping mode

License

This project is licensed under the MIT License. See the LICENSE file for the full license text.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.1

Mar 7, 2026

0.2.0

Feb 23, 2026

This version

0.1.2

Feb 18, 2026

0.1.1

Feb 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webc-0.1.2.tar.gz (8.2 kB view details)

Uploaded Feb 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

webc-0.1.2-py3-none-any.whl (8.2 kB view details)

Uploaded Feb 18, 2026 Python 3

File details

Details for the file webc-0.1.2.tar.gz.

File metadata

Download URL: webc-0.1.2.tar.gz
Upload date: Feb 18, 2026
Size: 8.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for webc-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`39fb625f4547bf80ced5c0a2a1ae8ba6656b4971f3f22956f1edb308ccaa5ee8`
MD5	`c5909506497965e238542bb47068ba5d`
BLAKE2b-256	`f7faf0d00be3d3e1baa4d7c9de1a8ccb852f1f7c61dd8080bf31e96f96660e84`

See more details on using hashes here.

File details

Details for the file webc-0.1.2-py3-none-any.whl.

File metadata

Download URL: webc-0.1.2-py3-none-any.whl
Upload date: Feb 18, 2026
Size: 8.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for webc-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`69df532a06ae78b8b30f369be90c0324f8bbe2220ceabd964b9e1a860c5cf318`
MD5	`ad31155ade35a902af06d4d1c8cfd823`
BLAKE2b-256	`1f67bb74b00578361e829b870391bc6b8da7fcc16f60aa05d82d7de629d916da`

See more details on using hashes here.

webc 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

WebC – Treat Websites as Python Objects

Overview

⚠️ Developer Preview / Secure Beta

Installation

Dependencies

Core Architecture

1. Resource Layer

2. Structure Layer

Image Handling

Table Extraction

3. Query Layer

4. Task Layer

Security & Usage Policy

Platform Restrictions

Built-in Protections

Responsible Use

Full Usage Example

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes