Search the web, scrape sites, and generate reports — all from your terminal.

These details have not been verified by PyPI

Project links

Project description

📦 Plethora

Search the web, scrape sites, and generate reports — from Python.

Install with pip and use anywhere: scripts, notebooks, Google Colab, or the command line.

🚀 Installation

pip install plethora

Works on Linux, macOS, Windows, Termux, and Google Colab.

📖 Quick Start

Python Library

from plethora import web_search, scrape_page, scrape_subpages, run

# Search the web
results = web_search("python tutorials", num_results=10)

# Scrape a single page
page = scrape_page("https://example.com")
print(page["title"], page["headings"], page["lists"], page["tables"])

# Full pipeline — search, scrape, and generate reports
paths = run("AI news 2026", level="high", num_results=5, out_format="all")

Google Colab

!pip install plethora

from plethora import run

# Generate a markdown report right in your notebook
paths = run("machine learning trends", level="medium", out_format="md")

Command Line

# Basic usage
plethora "your search query" --level medium

# All formats at once
plethora "AI research" --level high --format all

# Parallel scrape with 8 threads
plethora "web dev trends" --level medium --workers 8 --no-cache

# Quiet mode for piping
plethora "data science" --level low --quiet --format json

CLI Options

plethora <query> [options]

  -l, --level LEVEL      low | medium | high                   (default: medium)
  -n, --results N        Number of search results               (default: 5)
  -s, --subpages N       Max sub-pages per site (high only)     (default: 2)
  -o, --output DIR       Output directory                       (default: reports/)
  -f, --format FMT       txt | md | html | json | pdf | all   (default: txt)
  -w, --workers N        Concurrent scraping threads            (default: 4)
  -q, --quiet            Suppress progress output
  --no-cache             Bypass URL cache
  --cache-ttl SECS       Cache TTL in seconds                   (default: 3600)

📋 Scrape Levels

┌──────────┬──────────────────────────────────────────────────────┐
│  Level   │  What You Get                                       │
├──────────┼──────────────────────────────────────────────────────┤
│  🟢 LOW  │  Search results list — titles, URLs, snippets       │
│          │  ⚡ Instant — doesn't visit any pages                │
├──────────┼──────────────────────────────────────────────────────┤
│  🟡 MED  │  Visits each result page — pulls headings, meta,    │
│          │  lists, and a content preview (500 chars)            │
├──────────┼──────────────────────────────────────────────────────┤
│  🔴 HIGH │  Deep scrape — full page content + follows links    │
│          │  to sub-pages. Tables, images, 2000 char content    │
└──────────┴──────────────────────────────────────────────────────┘

📝 Output Formats

Format	Extension	Description
txt	`.txt`	Clean plain text — great for terminal reading
md	`.md`	Markdown — perfect for pasting into notes or docs
html	`.html`	Self-contained HTML with dark theme
json	`.json`	Raw structured data — feed it into your own scripts
pdf	`.pdf`	Portable PDF with watermark

Use --format all or out_format="all" to generate everything at once.

✨ Features

Concurrent scraping — pages are fetched in parallel with configurable threads
Smart caching — already-fetched URLs are cached locally (1hr default TTL)
robots.txt respect — checks before scraping, skips disallowed URLs
Auto-retries — failed requests retry 3x with exponential backoff
Per-domain rate limiting — won't hammer the same site
Rich extraction — headings (h1–h6), paragraphs, lists, tables, image metadata
Progress bars — live Rich progress when scraping (install with pip install plethora[rich])

📦 Dependencies

Required:

requests — HTTP client
beautifulsoup4 — HTML parsing
fpdf2 — PDF generation

Optional:

rich — progress bars (pip install plethora[rich])

⚠️ Disclaimer

This tool is for personal research and educational purposes only. It respects robots.txt, includes per-domain rate limiting, and plays nice with servers. Please don't abuse it. Use responsibly.

💰 Support This Project

If you find this useful, consider supporting me — it keeps me building stuff like this.

Built by @soumyadipkarforma · MIT License

🌿 Other Branches

Branch	What's There
`main`	🐚 Terminal scripts & CLI tool — clone and start scraping
`website`	🌐 React web app — try it live

This branch (pypi-package) has the pip-installable Python package.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.0

Mar 1, 2026

1.0.0

Mar 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plethora-2.0.0.tar.gz (18.0 kB view details)

Uploaded Mar 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

plethora-2.0.0-py3-none-any.whl (16.4 kB view details)

Uploaded Mar 1, 2026 Python 3

File details

Details for the file plethora-2.0.0.tar.gz.

File metadata

Download URL: plethora-2.0.0.tar.gz
Upload date: Mar 1, 2026
Size: 18.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for plethora-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`36e5ae845484fcf44d8a7927c37cef9b1d811b7a1143c5bbd30a13f00b79398c`
MD5	`674b9b2b2ec10a759d7f2551262f1ff9`
BLAKE2b-256	`f6de5fd15c4b016d585f7fee97afcf88af8673281593af27b83f7d4a04133776`

See more details on using hashes here.

File details

Details for the file plethora-2.0.0-py3-none-any.whl.

File metadata

Download URL: plethora-2.0.0-py3-none-any.whl
Upload date: Mar 1, 2026
Size: 16.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for plethora-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3fb4e0edf123ee85acc69d22554536aadbdb848a154275b10ee5a7a63e0784cb`
MD5	`d9cab005d754f4fc90194b04edf46072`
BLAKE2b-256	`1a5f2155d6e094c93e329f5f85ff340db2ec54200195d268926852f0a5382ab5`

See more details on using hashes here.

plethora 2.0.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

📦 Plethora

Search the web, scrape sites, and generate reports — from Python.

🚀 Installation

📖 Quick Start

Python Library

Google Colab

Command Line

CLI Options

📋 Scrape Levels

📝 Output Formats

✨ Features

📦 Dependencies

⚠️ Disclaimer

💰 Support This Project

🌿 Other Branches

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes