Skip to main content

Search the web, scrape sites, and generate reports โ€” all from your terminal.

Project description

๐Ÿ“ฆ Plethora

Search the web, scrape sites, and generate reports โ€” from Python.

Install with pip and use anywhere: scripts, notebooks, Google Colab, or the command line.

PyPI Python 3.10+ License: MIT Sponsor

Instagram X YouTube Email


๐Ÿš€ Installation

pip install plethora

Works on Linux, macOS, Windows, Termux, and Google Colab.


๐Ÿ“– Quick Start

Python Library

from plethora import web_search, scrape_page, scrape_subpages, run

# Search the web
results = web_search("python tutorials", num_results=10)

# Scrape a single page
page = scrape_page("https://example.com")
print(page["title"], page["headings"], page["lists"], page["tables"])

# Full pipeline โ€” search, scrape, and generate reports
paths = run("AI news 2026", level="high", num_results=5, out_format="all")

Google Colab

!pip install plethora

from plethora import run

# Generate a markdown report right in your notebook
paths = run("machine learning trends", level="medium", out_format="md")

Command Line

# Basic usage
plethora "your search query" --level medium

# All formats at once
plethora "AI research" --level high --format all

# Parallel scrape with 8 threads
plethora "web dev trends" --level medium --workers 8 --no-cache

# Quiet mode for piping
plethora "data science" --level low --quiet --format json

CLI Options

plethora <query> [options]

  -l, --level LEVEL      low | medium | high                   (default: medium)
  -n, --results N        Number of search results               (default: 5)
  -s, --subpages N       Max sub-pages per site (high only)     (default: 2)
  -o, --output DIR       Output directory                       (default: reports/)
  -f, --format FMT       txt | md | html | json | pdf | all   (default: txt)
  -w, --workers N        Concurrent scraping threads            (default: 4)
  -q, --quiet            Suppress progress output
  --no-cache             Bypass URL cache
  --cache-ttl SECS       Cache TTL in seconds                   (default: 3600)

๐Ÿ“‹ Scrape Levels

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Level   โ”‚  What You Get                                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  ๐ŸŸข LOW  โ”‚  Search results list โ€” titles, URLs, snippets       โ”‚
โ”‚          โ”‚  โšก Instant โ€” doesn't visit any pages                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  ๐ŸŸก MED  โ”‚  Visits each result page โ€” pulls headings, meta,    โ”‚
โ”‚          โ”‚  lists, and a content preview (500 chars)            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  ๐Ÿ”ด HIGH โ”‚  Deep scrape โ€” full page content + follows links    โ”‚
โ”‚          โ”‚  to sub-pages. Tables, images, 2000 char content    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“ Output Formats

Format Extension Description
txt .txt Clean plain text โ€” great for terminal reading
md .md Markdown โ€” perfect for pasting into notes or docs
html .html Self-contained HTML with dark theme
json .json Raw structured data โ€” feed it into your own scripts
pdf .pdf Portable PDF with watermark

Use --format all or out_format="all" to generate everything at once.


โœจ Features

  • Concurrent scraping โ€” pages are fetched in parallel with configurable threads
  • Smart caching โ€” already-fetched URLs are cached locally (1hr default TTL)
  • robots.txt respect โ€” checks before scraping, skips disallowed URLs
  • Auto-retries โ€” failed requests retry 3x with exponential backoff
  • Per-domain rate limiting โ€” won't hammer the same site
  • Rich extraction โ€” headings (h1โ€“h6), paragraphs, lists, tables, image metadata
  • Progress bars โ€” live Rich progress when scraping (install with pip install plethora[rich])

๐Ÿ“ฆ Dependencies

Required:

  • requests โ€” HTTP client
  • beautifulsoup4 โ€” HTML parsing
  • fpdf2 โ€” PDF generation

Optional:

  • rich โ€” progress bars (pip install plethora[rich])

โš ๏ธ Disclaimer

This tool is for personal research and educational purposes only. It respects robots.txt, includes per-domain rate limiting, and plays nice with servers. Please don't abuse it. Use responsibly.


๐Ÿ’ฐ Support This Project

If you find this useful, consider supporting me โ€” it keeps me building stuff like this.

Sponsor on GitHub Buy Me a Coffee Patreon


Built by @soumyadipkarforma ยท MIT License

Instagram X YouTube Email


๐ŸŒฟ Other Branches

Branch What's There
main ๐Ÿš Terminal scripts & CLI tool โ€” clone and start scraping
website ๐ŸŒ React web app โ€” try it live

This branch (pypi-package) has the pip-installable Python package.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plethora-2.0.0.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plethora-2.0.0-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file plethora-2.0.0.tar.gz.

File metadata

  • Download URL: plethora-2.0.0.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for plethora-2.0.0.tar.gz
Algorithm Hash digest
SHA256 36e5ae845484fcf44d8a7927c37cef9b1d811b7a1143c5bbd30a13f00b79398c
MD5 674b9b2b2ec10a759d7f2551262f1ff9
BLAKE2b-256 f6de5fd15c4b016d585f7fee97afcf88af8673281593af27b83f7d4a04133776

See more details on using hashes here.

File details

Details for the file plethora-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: plethora-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for plethora-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3fb4e0edf123ee85acc69d22554536aadbdb848a154275b10ee5a7a63e0784cb
MD5 d9cab005d754f4fc90194b04edf46072
BLAKE2b-256 1a5f2155d6e094c93e329f5f85ff340db2ec54200195d268926852f0a5382ab5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page