Search the web, scrape sites, and generate reports — all from your terminal.

These details have not been verified by PyPI

Project links

Project description

🔍 Plethora

Search the web. Scrape the sites. Generate reports. All from your terminal.

I built this because I got tired of manually Googling stuff and copy-pasting content. Now I just run a one-liner and get a clean report — low, medium, or high detail — in plain text, Markdown, HTML, JSON, or PDF. No browser needed. No fluff.

💡 Why I Made This

I wanted a fast way to research topics from the terminal — search for something, pull down the actual content from each result, and save it all in one place. So I wrote this: a set of scripts that does exactly that.

The idea is simple: pick a detail level, run the script, get your report.

🐚 The Scripts — The Fastest Way to Use This

These are the main thing. No flags to remember, no setup — just run them:

# Quick list of search results — titles, URLs, snippets
./scrape-low "best static site generators"

# Scrape the actual pages — headings, meta, content previews
./scrape-med "python web frameworks 2026"

# Full deep scrape — page content + sub-pages + everything
./scrape-high "machine learning research papers" 8 3

That's it. Each script takes a search query and optionally how many results you want. scrape-high also takes a sub-page count as the third argument.

./scrape-low  "query" [num_results]
./scrape-med  "query" [num_results]
./scrape-high "query" [num_results] [max_subpages]

After the scrape finishes, it shows you where the report was saved and asks if you want to view it right there in the terminal with less. Say y and read it, or n and go grab it from the reports/ folder later.

📋 What Each Level Gets You

┌──────────┬──────────────────────────────────────────────────────┐
│  Level   │  What You Get                                       │
├──────────┼──────────────────────────────────────────────────────┤
│  🟢 LOW  │  Search results list — titles, URLs, snippets       │
│          │  ⚡ Instant — doesn't visit any pages                │
├──────────┼──────────────────────────────────────────────────────┤
│  🟡 MED  │  Visits each result page — pulls headings, meta,    │
│          │  lists, and a content preview (500 chars)            │
├──────────┼──────────────────────────────────────────────────────┤
│  🔴 HIGH │  Deep scrape — full page content + follows links    │
│          │  to sub-pages. Tables, images, 2000 char content    │
└──────────┴──────────────────────────────────────────────────────┘

🚀 Setup

Install from PyPI (Recommended)

pip install plethora

That's it. Works everywhere — Linux, macOS, Windows, Termux, Google Colab.

After installing, use the CLI:

plethora "your search query" --level medium

Or use it as a Python library:

from plethora import web_search, scrape_page, run

results = web_search("python tutorials", num_results=5)
report_paths = run("AI news 2026", level="high", out_format="json")

Google Colab

!pip install plethora

from plethora import run
paths = run("machine learning trends", level="medium", out_format="md")

One-Command Setup (from source)

I've included setup scripts for every major platform. Just run the one for your system and everything gets installed — Python, pip, dependencies, permissions. Zero hassle.

Platform	Command
Termux (Android)	`bash termux-setup`
Linux (Debian/Fedora/Arch/openSUSE)	`bash linux-setup`
macOS	`bash mac-setup`
Windows	Double-click `windows-setup.bat` or run it from CMD

Each script handles the full chain: system packages → Python → pip dependencies → script permissions. After running it, you're ready to go.

Manual Setup

If you'd rather do it yourself:

Python 3.10+
requests + beautifulsoup4 (required)
rich (optional — gives you nice progress bars)
fpdf2 (required for PDF output)

pip install requests beautifulsoup4 rich fpdf2

Make the scripts executable:

chmod +x scrape-low scrape-med scrape-high

You're good to go.

⚙️ Advanced: The Python CLI

If you need more control, use scrape.py directly with flags:

# Basic usage
python scrape.py "your search query" --level medium

# Generate all formats at once (txt + md + html + json + pdf)
python scrape.py "AI research" --level high --format all

# Parallel scrape with 8 threads, skip cache
python scrape.py "web dev trends" --level medium --workers 8 --no-cache

# Quiet mode for piping
python scrape.py "data science" --level low --quiet --format json

All Options

python scrape.py <query> [options]

  -l, --level LEVEL      low | medium | high                   (default: medium)
  -n, --results N        Number of search results               (default: 5)
  -s, --subpages N       Max sub-pages per site (high only)     (default: 2)
  -o, --output DIR       Output directory                       (default: reports/)
  -f, --format FMT       txt | md | html | json | pdf | all   (default: txt)
  -w, --workers N        Concurrent scraping threads            (default: 4)
  -q, --quiet            Suppress progress output
  --no-cache             Bypass URL cache
  --cache-ttl SECS       Cache TTL in seconds                   (default: 3600)

📝 Output Formats

Format	Extension	Description
txt	`.txt`	Clean plain text — great for terminal reading
md	`.md`	Markdown — perfect for pasting into notes or docs
html	`.html`	Self-contained HTML with dark theme — open in any browser
json	`.json`	Raw structured data — feed it into your own scripts
pdf	`.pdf`	Portable PDF with watermark — share or print anywhere

All formats include the Plethora watermark. Use --format all to get everything.

✨ What's Under the Hood

Concurrent scraping — pages are fetched in parallel with configurable threads
Smart caching — already-fetched URLs are cached locally (1hr default TTL)
robots.txt respect — checks before scraping, skips disallowed URLs
Auto-retries — failed requests retry 3x with exponential backoff
Per-domain rate limiting — won't hammer the same site
Rich extraction — headings (h1–h6), paragraphs, lists, tables, image metadata
Progress bars — live Rich progress when scraping (disable with --quiet)

📂 Project Structure

plethora/
├── scrape-low          # ⭐ Shell shortcut → low detail report
├── scrape-med          # ⭐ Shell shortcut → medium detail report
├── scrape-high         # ⭐ Shell shortcut → high detail report
├── scrape.py           # Full CLI with all options
├── scraper.py          # Core engine — search, scrape, concurrency, caching
├── formatter.py        # Report generators — txt, md, html, json, pdf
├── common              # Shared shell helper (argument parsing)
├── termux-setup        # 📱 One-command Termux setup
├── linux-setup         # 🐧 One-command Linux setup
├── mac-setup           # 🍎 One-command macOS setup
├── windows-setup.bat   # 🪟 One-command Windows setup
├── .cache/             # URL cache (auto-created)
└── reports/            # All generated reports go here

📖 Example Output

🟢 Low Report — search results at a glance

============================================================
 LOW-DETAIL REPORT
 Query: python web scraping
 Results: 5
============================================================

  1. Python Web Scraping Tutorial - GeeksforGeeks
     https://www.geeksforgeeks.org/python/python-web-scraping-tutorial/
     Web scraping is the process of extracting data from websites…

  2. Beautiful Soup: Build a Web Scraper With Python
     https://realpython.com/beautiful-soup-web-scraper-python/
     Learn how to use Beautiful Soup and Requests to scrape…

🟡 Medium Report — page content & structure

────────────────────────────────────────────────────────────
  [1] Python Web Scraping Tutorial - GeeksforGeeks
  URL: https://www.geeksforgeeks.org/python/python-web-scraping-tutorial/
  Meta: Comprehensive guide to web scraping with Python…
    • Python Web Scraping Tutorial
      • Requests Module
      • Parsing HTML with BeautifulSoup
      • Selenium

  ── Content Preview ──
  Web scraping is the process of extracting data from websites
  automatically. Python is widely used for web scraping because…

🔴 High Report — deep scrape with sub-pages

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  [1] Python Web Scraping Tutorial - GeeksforGeeks
  URL: https://www.geeksforgeeks.org/python/python-web-scraping-tutorial/

  ── Headings ──
    • Python Web Scraping Tutorial
      • Requests Module
      • Parsing HTML with BeautifulSoup
      • Selenium

  ── Content ──
  [Full extracted text up to 2000 characters…]

  🖼 Tutorial diagram — https://media.geeksforgeeks.org/…

  ── Sub-pages (2) ──
    ┌ Sub-page 1: Requests Tutorial
    │ URL: https://www.geeksforgeeks.org/python-requests-tutorial/
    │ [Sub-page content up to 800 characters…]
    └────────────────────────────────────────

🔧 Using as a Python Library

from plethora import web_search, scrape_page, scrape_subpages, run

# Search only
results = web_search("your query", num_results=10)

# Scrape a single URL
page = scrape_page("https://example.com")
print(page["title"], page["headings"], page["lists"], page["tables"])

# Full pipeline — returns list of report file paths
paths = run("AI news 2026", level="high", num_results=5, out_format="all")

📦 Publishing to PyPI

Automatic (GitHub Actions)

A workflow is included that auto-publishes to PyPI when you create a GitHub release.

Get an API token from pypi.org/manage/account
Add it as a repo secret named PYPI_API_TOKEN in Settings → Secrets → Actions
Create a new release on GitHub (e.g., tag v1.0.0)
The workflow builds and uploads automatically

Manual (Termux / any terminal)

pip install build twine
python -m build
twine upload dist/*

You'll be prompted for your PyPI username (__token__) and API token.

⚠️ Disclaimer

This tool is for personal research and educational purposes only. It respects robots.txt, includes per-domain rate limiting, and plays nice with servers. Please don't abuse it. Use responsibly.

💰 Support This Project

If you find this useful, consider supporting me — it keeps me building stuff like this.

Built by @soumyadipkarforma · MIT License

🌿 Other Branches

Branch	What's There
`website`	🌐 React web app — use Plethora from your browser. Live demo →
`pypi-package`	📦 Pip-installable Python library — `pip install plethora` for use in your own scripts

This branch (main) has the terminal scripts and CLI tool — clone it and start scraping.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.0

Mar 1, 2026

This version

1.0.0

Mar 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plethora-1.0.0.tar.gz (22.6 kB view details)

Uploaded Mar 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

plethora-1.0.0-py3-none-any.whl (18.8 kB view details)

Uploaded Mar 1, 2026 Python 3

File details

Details for the file plethora-1.0.0.tar.gz.

File metadata

Download URL: plethora-1.0.0.tar.gz
Upload date: Mar 1, 2026
Size: 22.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for plethora-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`fd64a8081025ce6bcd04b2edacd34c1d5219498ab40aa4895525f0b67a5d0168`
MD5	`4330399743a355806ca244083b1cfe9e`
BLAKE2b-256	`2ef862f1405f473737c8b292a3c2f00c35898bcae160f97bb1e1abd5b85b39ad`

See more details on using hashes here.

File details

Details for the file plethora-1.0.0-py3-none-any.whl.

File metadata

Download URL: plethora-1.0.0-py3-none-any.whl
Upload date: Mar 1, 2026
Size: 18.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for plethora-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`671ea53457e8e549be0e9f0bf131d9bab8f7326c77104232c9b7b69d62fdbe8d`
MD5	`046c571ffcc1308081c007bf5acacf91`
BLAKE2b-256	`f6f363bb37aa4d38f21352fd856558e7d098d2ccf9e166efe6e59ac0af6c51d7`

See more details on using hashes here.

plethora 1.0.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

🔍 Plethora

Search the web. Scrape the sites. Generate reports. All from your terminal.

💡 Why I Made This

🐚 The Scripts — The Fastest Way to Use This

📋 What Each Level Gets You

🚀 Setup

Install from PyPI (Recommended)

Google Colab

One-Command Setup (from source)

Manual Setup

⚙️ Advanced: The Python CLI

All Options

📝 Output Formats

✨ What's Under the Hood

📂 Project Structure

📖 Example Output

🔧 Using as a Python Library

📦 Publishing to PyPI

Automatic (GitHub Actions)

Manual (Termux / any terminal)

⚠️ Disclaimer

💰 Support This Project

🌿 Other Branches

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes