Skip to main content

A powerful and advanced web scraping and automation library for Python

Project description

jwebs

Python Version License Code style: black PyPI version


jwebs logo

jwebs is a complete, high‑performance library for web scraping, crawling automation, and content analysis. It supports both HTTP/1.1 and HTTP/2 (user selectable) and includes built‑in caching, rate limiting, robots.txt handling, dynamic proxy rotation, distributed crawling (via Redis), data extraction, content differencing, uptime monitoring, Sitemap/RSS generation, and optional AI‑powered extraction.


Quick Start – Simple GET Request

from jwebs import JWebs

j = JWebs()
resp = j.GET("https://example.com")
print(f"Status: {resp.status}")
print(f"Content length: {len(resp.text)}")

Main Capabilities

· HTTP – HTTP/1.1 and HTTP/2 (user selectable), Keep‑Alive, automatic redirects, batch concurrent requests.

· Request Management – Two‑layer cache (memory + SQLite), rate limiting (Token Bucket), robots.txt respect, session management.

· Security & Flexibility – User‑Agent rotation, dynamic proxy rotation, client certificates (mTLS), SSL and security headers checking.

· Crawling & Automation – Simple crawler and distributed crawler (Redis) that can run across multiple machines.

· Data Extraction – Extract text, links, emails, phone numbers, prices, JSON‑LD, meta tags, images, social media links. · Content Analysis – Sentiment analysis, automatic translation, content differencing (diff).

· Monitoring – Uptime monitoring, performance testing (TTFB, page size), SEO and security audits.

· Utilities – Sitemap.xml generator, RSS feed generator, GraphQL client, async client.

· AI (optional) – Intelligent data extraction via natural language instructions (DeepSeek/OpenAI) and text summarization.


Installation

# Basic installation (core dependencies only)
pip install jwebs

# With HTTP/2 support
pip install jwebs[http2]

# With distributed crawler (Redis)
pip install jwebs[distributed]

# All optional features
pip install jwebs[all]

Debug

If you don't have Redis, install it using your package manager:

· Ubuntu/Debian: sudo apt install redis · Termux (Android): pkg install redis · macOS: brew install redis

Or download from redis.io


More Examples

HTTP/2 and Caching

from jwebs import JWebs

j = JWebs(http_version='2', use_cache=True)
title = j.GET_TITLE("https://http2.golang.org/")
print(f"Title: {title}")

Extracting Emails and Links

from jwebs import JWebs

j = JWebs()
emails = j.EXTRACT_EMAILS("https://example.com")
links = j.GET_LINKS("https://example.com", internal=True)
print(f"Emails: {emails}\nInternal Links: {len(links)}")

Distributed Crawling with Redis

from jwebs import JWebs

j = JWebs()
crawler = j.create_distributed_crawler(redis_url="redis://localhost:6379/0")
crawler.add_seed("https://example.com", depth=0)
crawler.crawl_worker(max_pages=10, max_depth=2, strict_page_limit=True)

results = crawler.get_all_results()
for url, info in results.items():
    print(f"{url}{info.get('title', 'no title')}")

Security Audit

from jwebs import JWebs

j = JWebs()
report = j.SECURITY_AUDIT("https://example.com")
print(f"SSL valid: {report.ssl_valid}")
print(f"Security grade: {report.grade}")

Content Differencing

from jwebs import JWebs

j = JWebs()
snap1 = j.TAKE_SNAPSHOT("version1", "Hello world")
snap2 = j.TAKE_SNAPSHOT("version2", "Hello jwebs")
diff = j.COMPARE_SNAPSHOTS(snap1, snap2)
print(f"Similarity: {j.SIMILARITY('Hello world', 'Hello jwebs')}")

Uptime Monitor

from jwebs import JWebs
import time

j = JWebs()
j.MONITOR_URL("https://example.com", expected_status=200)
j.START_MONITORING()
time.sleep(5)
j.STOP_MONITORING()

Issues and Contributions

You can report bugs via GitHub Issues or submit fixes via pull requests.


Links

· GitHub repository: https://github.com/JCode-JCode/jwebs · PyPI page: https://pypi.org/project/jwebs/


License

This project is licensed under the Apache License 2.0 – see the LICENSE file for details.


Designed and built with love by J Code

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jwebs-1.0.0.tar.gz (53.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jwebs-1.0.0-py3-none-any.whl (62.6 kB view details)

Uploaded Python 3

File details

Details for the file jwebs-1.0.0.tar.gz.

File metadata

  • Download URL: jwebs-1.0.0.tar.gz
  • Upload date:
  • Size: 53.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jwebs-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c2148357ebdbfd0946a7c0867cc937195706c9a6e8cd113b93a912fc3287f6e3
MD5 6cabad4c1068b9b8e9cf40bd304f0a94
BLAKE2b-256 980c6f9e7da8596404d3b7da0c2a16e43f79ade13e488dda5789952bee3c027a

See more details on using hashes here.

Provenance

The following attestation bundles were made for jwebs-1.0.0.tar.gz:

Publisher: publish.yml on JCode-JCode/jwebs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file jwebs-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: jwebs-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 62.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jwebs-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3c5654f5527d56e94737eda36c8acf021e2312629b00415b442f646c3c8e6d17
MD5 688ff68869e2d0df4f8a79e497acb521
BLAKE2b-256 d6efc3e56f3c90f607ad73dca9009bfe167eec057d18f614f600a3324e2544d4

See more details on using hashes here.

Provenance

The following attestation bundles were made for jwebs-1.0.0-py3-none-any.whl:

Publisher: publish.yml on JCode-JCode/jwebs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page