Skip to main content

Search the web, scrape sites, and generate reports โ€” all from your terminal.

Project description

๐Ÿ” Plethora

Search the web. Scrape the sites. Generate reports. All from your terminal.

I built this because I got tired of manually Googling stuff and copy-pasting content. Now I just run a one-liner and get a clean report โ€” low, medium, or high detail โ€” in plain text, Markdown, HTML, JSON, or PDF. No browser needed. No fluff.

PyPI Python 3.10+ License: MIT Sponsor

Instagram X YouTube Email


๐Ÿ’ก Why I Made This

I wanted a fast way to research topics from the terminal โ€” search for something, pull down the actual content from each result, and save it all in one place. So I wrote this: a set of scripts that does exactly that.

The idea is simple: pick a detail level, run the script, get your report.


๐Ÿš The Scripts โ€” The Fastest Way to Use This

These are the main thing. No flags to remember, no setup โ€” just run them:

# Quick list of search results โ€” titles, URLs, snippets
./scrape-low "best static site generators"

# Scrape the actual pages โ€” headings, meta, content previews
./scrape-med "python web frameworks 2026"

# Full deep scrape โ€” page content + sub-pages + everything
./scrape-high "machine learning research papers" 8 3

That's it. Each script takes a search query and optionally how many results you want. scrape-high also takes a sub-page count as the third argument.

./scrape-low  "query" [num_results]
./scrape-med  "query" [num_results]
./scrape-high "query" [num_results] [max_subpages]

After the scrape finishes, it shows you where the report was saved and asks if you want to view it right there in the terminal with less. Say y and read it, or n and go grab it from the reports/ folder later.


๐Ÿ“‹ What Each Level Gets You

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Level   โ”‚  What You Get                                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  ๐ŸŸข LOW  โ”‚  Search results list โ€” titles, URLs, snippets       โ”‚
โ”‚          โ”‚  โšก Instant โ€” doesn't visit any pages                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  ๐ŸŸก MED  โ”‚  Visits each result page โ€” pulls headings, meta,    โ”‚
โ”‚          โ”‚  lists, and a content preview (500 chars)            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  ๐Ÿ”ด HIGH โ”‚  Deep scrape โ€” full page content + follows links    โ”‚
โ”‚          โ”‚  to sub-pages. Tables, images, 2000 char content    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿš€ Setup

Install from PyPI (Recommended)

pip install plethora

That's it. Works everywhere โ€” Linux, macOS, Windows, Termux, Google Colab.

After installing, use the CLI:

plethora "your search query" --level medium

Or use it as a Python library:

from plethora import web_search, scrape_page, run

results = web_search("python tutorials", num_results=5)
report_paths = run("AI news 2026", level="high", out_format="json")

Google Colab

!pip install plethora

from plethora import run
paths = run("machine learning trends", level="medium", out_format="md")

One-Command Setup (from source)

I've included setup scripts for every major platform. Just run the one for your system and everything gets installed โ€” Python, pip, dependencies, permissions. Zero hassle.

Platform Command
Termux (Android) bash termux-setup
Linux (Debian/Fedora/Arch/openSUSE) bash linux-setup
macOS bash mac-setup
Windows Double-click windows-setup.bat or run it from CMD

Each script handles the full chain: system packages โ†’ Python โ†’ pip dependencies โ†’ script permissions. After running it, you're ready to go.

Manual Setup

If you'd rather do it yourself:

  • Python 3.10+
  • requests + beautifulsoup4 (required)
  • rich (optional โ€” gives you nice progress bars)
  • fpdf2 (required for PDF output)
pip install requests beautifulsoup4 rich fpdf2

Make the scripts executable:

chmod +x scrape-low scrape-med scrape-high

You're good to go.


โš™๏ธ Advanced: The Python CLI

If you need more control, use scrape.py directly with flags:

# Basic usage
python scrape.py "your search query" --level medium

# Generate all formats at once (txt + md + html + json + pdf)
python scrape.py "AI research" --level high --format all

# Parallel scrape with 8 threads, skip cache
python scrape.py "web dev trends" --level medium --workers 8 --no-cache

# Quiet mode for piping
python scrape.py "data science" --level low --quiet --format json

All Options

python scrape.py <query> [options]

  -l, --level LEVEL      low | medium | high                   (default: medium)
  -n, --results N        Number of search results               (default: 5)
  -s, --subpages N       Max sub-pages per site (high only)     (default: 2)
  -o, --output DIR       Output directory                       (default: reports/)
  -f, --format FMT       txt | md | html | json | pdf | all   (default: txt)
  -w, --workers N        Concurrent scraping threads            (default: 4)
  -q, --quiet            Suppress progress output
  --no-cache             Bypass URL cache
  --cache-ttl SECS       Cache TTL in seconds                   (default: 3600)

๐Ÿ“ Output Formats

Format Extension Description
txt .txt Clean plain text โ€” great for terminal reading
md .md Markdown โ€” perfect for pasting into notes or docs
html .html Self-contained HTML with dark theme โ€” open in any browser
json .json Raw structured data โ€” feed it into your own scripts
pdf .pdf Portable PDF with watermark โ€” share or print anywhere

All formats include the Plethora watermark. Use --format all to get everything.


โœจ What's Under the Hood

  • Concurrent scraping โ€” pages are fetched in parallel with configurable threads
  • Smart caching โ€” already-fetched URLs are cached locally (1hr default TTL)
  • robots.txt respect โ€” checks before scraping, skips disallowed URLs
  • Auto-retries โ€” failed requests retry 3x with exponential backoff
  • Per-domain rate limiting โ€” won't hammer the same site
  • Rich extraction โ€” headings (h1โ€“h6), paragraphs, lists, tables, image metadata
  • Progress bars โ€” live Rich progress when scraping (disable with --quiet)

๐Ÿ“‚ Project Structure

plethora/
โ”œโ”€โ”€ scrape-low          # โญ Shell shortcut โ†’ low detail report
โ”œโ”€โ”€ scrape-med          # โญ Shell shortcut โ†’ medium detail report
โ”œโ”€โ”€ scrape-high         # โญ Shell shortcut โ†’ high detail report
โ”œโ”€โ”€ scrape.py           # Full CLI with all options
โ”œโ”€โ”€ scraper.py          # Core engine โ€” search, scrape, concurrency, caching
โ”œโ”€โ”€ formatter.py        # Report generators โ€” txt, md, html, json, pdf
โ”œโ”€โ”€ common              # Shared shell helper (argument parsing)
โ”œโ”€โ”€ termux-setup        # ๐Ÿ“ฑ One-command Termux setup
โ”œโ”€โ”€ linux-setup         # ๐Ÿง One-command Linux setup
โ”œโ”€โ”€ mac-setup           # ๐ŸŽ One-command macOS setup
โ”œโ”€โ”€ windows-setup.bat   # ๐ŸชŸ One-command Windows setup
โ”œโ”€โ”€ .cache/             # URL cache (auto-created)
โ””โ”€โ”€ reports/            # All generated reports go here

๐Ÿ“– Example Output

๐ŸŸข Low Report โ€” search results at a glance
============================================================
 LOW-DETAIL REPORT
 Query: python web scraping
 Results: 5
============================================================

  1. Python Web Scraping Tutorial - GeeksforGeeks
     https://www.geeksforgeeks.org/python/python-web-scraping-tutorial/
     Web scraping is the process of extracting data from websitesโ€ฆ

  2. Beautiful Soup: Build a Web Scraper With Python
     https://realpython.com/beautiful-soup-web-scraper-python/
     Learn how to use Beautiful Soup and Requests to scrapeโ€ฆ
๐ŸŸก Medium Report โ€” page content & structure
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  [1] Python Web Scraping Tutorial - GeeksforGeeks
  URL: https://www.geeksforgeeks.org/python/python-web-scraping-tutorial/
  Meta: Comprehensive guide to web scraping with Pythonโ€ฆ
    โ€ข Python Web Scraping Tutorial
      โ€ข Requests Module
      โ€ข Parsing HTML with BeautifulSoup
      โ€ข Selenium

  โ”€โ”€ Content Preview โ”€โ”€
  Web scraping is the process of extracting data from websites
  automatically. Python is widely used for web scraping becauseโ€ฆ
๐Ÿ”ด High Report โ€” deep scrape with sub-pages
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
  [1] Python Web Scraping Tutorial - GeeksforGeeks
  URL: https://www.geeksforgeeks.org/python/python-web-scraping-tutorial/

  โ”€โ”€ Headings โ”€โ”€
    โ€ข Python Web Scraping Tutorial
      โ€ข Requests Module
      โ€ข Parsing HTML with BeautifulSoup
      โ€ข Selenium

  โ”€โ”€ Content โ”€โ”€
  [Full extracted text up to 2000 charactersโ€ฆ]

  ๐Ÿ–ผ Tutorial diagram โ€” https://media.geeksforgeeks.org/โ€ฆ

  โ”€โ”€ Sub-pages (2) โ”€โ”€
    โ”Œ Sub-page 1: Requests Tutorial
    โ”‚ URL: https://www.geeksforgeeks.org/python-requests-tutorial/
    โ”‚ [Sub-page content up to 800 charactersโ€ฆ]
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

๐Ÿ”ง Using as a Python Library

from plethora import web_search, scrape_page, scrape_subpages, run

# Search only
results = web_search("your query", num_results=10)

# Scrape a single URL
page = scrape_page("https://example.com")
print(page["title"], page["headings"], page["lists"], page["tables"])

# Full pipeline โ€” returns list of report file paths
paths = run("AI news 2026", level="high", num_results=5, out_format="all")

๐Ÿ“ฆ Publishing to PyPI

Automatic (GitHub Actions)

A workflow is included that auto-publishes to PyPI when you create a GitHub release.

  1. Get an API token from pypi.org/manage/account
  2. Add it as a repo secret named PYPI_API_TOKEN in Settings โ†’ Secrets โ†’ Actions
  3. Create a new release on GitHub (e.g., tag v1.0.0)
  4. The workflow builds and uploads automatically

Manual (Termux / any terminal)

pip install build twine
python -m build
twine upload dist/*

You'll be prompted for your PyPI username (__token__) and API token.


โš ๏ธ Disclaimer

This tool is for personal research and educational purposes only. It respects robots.txt, includes per-domain rate limiting, and plays nice with servers. Please don't abuse it. Use responsibly.


๐Ÿ’ฐ Support This Project

If you find this useful, consider supporting me โ€” it keeps me building stuff like this.

Sponsor on GitHub Buy Me a Coffee Patreon


Built by @soumyadipkarforma ยท MIT License

Instagram X YouTube Email


๐ŸŒฟ Other Branches

Branch What's There
website ๐ŸŒ React web app โ€” use Plethora from your browser. Live demo โ†’
pypi-package ๐Ÿ“ฆ Pip-installable Python library โ€” pip install plethora for use in your own scripts

This branch (main) has the terminal scripts and CLI tool โ€” clone it and start scraping.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plethora-1.0.0.tar.gz (22.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plethora-1.0.0-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file plethora-1.0.0.tar.gz.

File metadata

  • Download URL: plethora-1.0.0.tar.gz
  • Upload date:
  • Size: 22.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for plethora-1.0.0.tar.gz
Algorithm Hash digest
SHA256 fd64a8081025ce6bcd04b2edacd34c1d5219498ab40aa4895525f0b67a5d0168
MD5 4330399743a355806ca244083b1cfe9e
BLAKE2b-256 2ef862f1405f473737c8b292a3c2f00c35898bcae160f97bb1e1abd5b85b39ad

See more details on using hashes here.

File details

Details for the file plethora-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: plethora-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for plethora-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 671ea53457e8e549be0e9f0bf131d9bab8f7326c77104232c9b7b69d62fdbe8d
MD5 046c571ffcc1308081c007bf5acacf91
BLAKE2b-256 f6f363bb37aa4d38f21352fd856558e7d098d2ccf9e166efe6e59ac0af6c51d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page