Search the web, scrape sites, and generate reports โ all from your terminal.
Project description
๐ Plethora
Search the web. Scrape the sites. Generate reports. All from your terminal.
I built this because I got tired of manually Googling stuff and copy-pasting content. Now I just run a one-liner and get a clean report โ low, medium, or high detail โ in plain text, Markdown, HTML, JSON, or PDF. No browser needed. No fluff.
๐ก Why I Made This
I wanted a fast way to research topics from the terminal โ search for something, pull down the actual content from each result, and save it all in one place. So I wrote this: a set of scripts that does exactly that.
The idea is simple: pick a detail level, run the script, get your report.
๐ The Scripts โ The Fastest Way to Use This
These are the main thing. No flags to remember, no setup โ just run them:
# Quick list of search results โ titles, URLs, snippets
./scrape-low "best static site generators"
# Scrape the actual pages โ headings, meta, content previews
./scrape-med "python web frameworks 2026"
# Full deep scrape โ page content + sub-pages + everything
./scrape-high "machine learning research papers" 8 3
That's it. Each script takes a search query and optionally how many results you want.
scrape-high also takes a sub-page count as the third argument.
./scrape-low "query" [num_results]
./scrape-med "query" [num_results]
./scrape-high "query" [num_results] [max_subpages]
After the scrape finishes, it shows you where the report was saved and asks
if you want to view it right there in the terminal with less. Say y and read it,
or n and go grab it from the reports/ folder later.
๐ What Each Level Gets You
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Level โ What You Get โ
โโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๐ข LOW โ Search results list โ titles, URLs, snippets โ
โ โ โก Instant โ doesn't visit any pages โ
โโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๐ก MED โ Visits each result page โ pulls headings, meta, โ
โ โ lists, and a content preview (500 chars) โ
โโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๐ด HIGH โ Deep scrape โ full page content + follows links โ
โ โ to sub-pages. Tables, images, 2000 char content โ
โโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Setup
Install from PyPI (Recommended)
pip install plethora
That's it. Works everywhere โ Linux, macOS, Windows, Termux, Google Colab.
After installing, use the CLI:
plethora "your search query" --level medium
Or use it as a Python library:
from plethora import web_search, scrape_page, run
results = web_search("python tutorials", num_results=5)
report_paths = run("AI news 2026", level="high", out_format="json")
Google Colab
!pip install plethora
from plethora import run
paths = run("machine learning trends", level="medium", out_format="md")
One-Command Setup (from source)
I've included setup scripts for every major platform. Just run the one for your system and everything gets installed โ Python, pip, dependencies, permissions. Zero hassle.
| Platform | Command |
|---|---|
| Termux (Android) | bash termux-setup |
| Linux (Debian/Fedora/Arch/openSUSE) | bash linux-setup |
| macOS | bash mac-setup |
| Windows | Double-click windows-setup.bat or run it from CMD |
Each script handles the full chain: system packages โ Python โ pip dependencies โ script permissions. After running it, you're ready to go.
Manual Setup
If you'd rather do it yourself:
- Python 3.10+
requests+beautifulsoup4(required)rich(optional โ gives you nice progress bars)fpdf2(required for PDF output)
pip install requests beautifulsoup4 rich fpdf2
Make the scripts executable:
chmod +x scrape-low scrape-med scrape-high
You're good to go.
โ๏ธ Advanced: The Python CLI
If you need more control, use scrape.py directly with flags:
# Basic usage
python scrape.py "your search query" --level medium
# Generate all formats at once (txt + md + html + json + pdf)
python scrape.py "AI research" --level high --format all
# Parallel scrape with 8 threads, skip cache
python scrape.py "web dev trends" --level medium --workers 8 --no-cache
# Quiet mode for piping
python scrape.py "data science" --level low --quiet --format json
All Options
python scrape.py <query> [options]
-l, --level LEVEL low | medium | high (default: medium)
-n, --results N Number of search results (default: 5)
-s, --subpages N Max sub-pages per site (high only) (default: 2)
-o, --output DIR Output directory (default: reports/)
-f, --format FMT txt | md | html | json | pdf | all (default: txt)
-w, --workers N Concurrent scraping threads (default: 4)
-q, --quiet Suppress progress output
--no-cache Bypass URL cache
--cache-ttl SECS Cache TTL in seconds (default: 3600)
๐ Output Formats
| Format | Extension | Description |
|---|---|---|
| txt | .txt |
Clean plain text โ great for terminal reading |
| md | .md |
Markdown โ perfect for pasting into notes or docs |
| html | .html |
Self-contained HTML with dark theme โ open in any browser |
| json | .json |
Raw structured data โ feed it into your own scripts |
.pdf |
Portable PDF with watermark โ share or print anywhere |
All formats include the Plethora watermark. Use --format all to get everything.
โจ What's Under the Hood
- Concurrent scraping โ pages are fetched in parallel with configurable threads
- Smart caching โ already-fetched URLs are cached locally (1hr default TTL)
- robots.txt respect โ checks before scraping, skips disallowed URLs
- Auto-retries โ failed requests retry 3x with exponential backoff
- Per-domain rate limiting โ won't hammer the same site
- Rich extraction โ headings (h1โh6), paragraphs, lists, tables, image metadata
- Progress bars โ live Rich progress when scraping (disable with
--quiet)
๐ Project Structure
plethora/
โโโ scrape-low # โญ Shell shortcut โ low detail report
โโโ scrape-med # โญ Shell shortcut โ medium detail report
โโโ scrape-high # โญ Shell shortcut โ high detail report
โโโ scrape.py # Full CLI with all options
โโโ scraper.py # Core engine โ search, scrape, concurrency, caching
โโโ formatter.py # Report generators โ txt, md, html, json, pdf
โโโ common # Shared shell helper (argument parsing)
โโโ termux-setup # ๐ฑ One-command Termux setup
โโโ linux-setup # ๐ง One-command Linux setup
โโโ mac-setup # ๐ One-command macOS setup
โโโ windows-setup.bat # ๐ช One-command Windows setup
โโโ .cache/ # URL cache (auto-created)
โโโ reports/ # All generated reports go here
๐ Example Output
๐ข Low Report โ search results at a glance
============================================================
LOW-DETAIL REPORT
Query: python web scraping
Results: 5
============================================================
1. Python Web Scraping Tutorial - GeeksforGeeks
https://www.geeksforgeeks.org/python/python-web-scraping-tutorial/
Web scraping is the process of extracting data from websitesโฆ
2. Beautiful Soup: Build a Web Scraper With Python
https://realpython.com/beautiful-soup-web-scraper-python/
Learn how to use Beautiful Soup and Requests to scrapeโฆ
๐ก Medium Report โ page content & structure
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[1] Python Web Scraping Tutorial - GeeksforGeeks
URL: https://www.geeksforgeeks.org/python/python-web-scraping-tutorial/
Meta: Comprehensive guide to web scraping with Pythonโฆ
โข Python Web Scraping Tutorial
โข Requests Module
โข Parsing HTML with BeautifulSoup
โข Selenium
โโ Content Preview โโ
Web scraping is the process of extracting data from websites
automatically. Python is widely used for web scraping becauseโฆ
๐ด High Report โ deep scrape with sub-pages
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[1] Python Web Scraping Tutorial - GeeksforGeeks
URL: https://www.geeksforgeeks.org/python/python-web-scraping-tutorial/
โโ Headings โโ
โข Python Web Scraping Tutorial
โข Requests Module
โข Parsing HTML with BeautifulSoup
โข Selenium
โโ Content โโ
[Full extracted text up to 2000 charactersโฆ]
๐ผ Tutorial diagram โ https://media.geeksforgeeks.org/โฆ
โโ Sub-pages (2) โโ
โ Sub-page 1: Requests Tutorial
โ URL: https://www.geeksforgeeks.org/python-requests-tutorial/
โ [Sub-page content up to 800 charactersโฆ]
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ง Using as a Python Library
from plethora import web_search, scrape_page, scrape_subpages, run
# Search only
results = web_search("your query", num_results=10)
# Scrape a single URL
page = scrape_page("https://example.com")
print(page["title"], page["headings"], page["lists"], page["tables"])
# Full pipeline โ returns list of report file paths
paths = run("AI news 2026", level="high", num_results=5, out_format="all")
๐ฆ Publishing to PyPI
Automatic (GitHub Actions)
A workflow is included that auto-publishes to PyPI when you create a GitHub release.
- Get an API token from pypi.org/manage/account
- Add it as a repo secret named
PYPI_API_TOKENin Settings โ Secrets โ Actions - Create a new release on GitHub (e.g., tag
v1.0.0) - The workflow builds and uploads automatically
Manual (Termux / any terminal)
pip install build twine
python -m build
twine upload dist/*
You'll be prompted for your PyPI username (__token__) and API token.
โ ๏ธ Disclaimer
This tool is for personal research and educational purposes only.
It respects robots.txt, includes per-domain rate limiting, and plays nice
with servers. Please don't abuse it. Use responsibly.
๐ฐ Support This Project
If you find this useful, consider supporting me โ it keeps me building stuff like this.
Built by @soumyadipkarforma ยท MIT License
๐ฟ Other Branches
| Branch | What's There |
|---|---|
website |
๐ React web app โ use Plethora from your browser. Live demo โ |
pypi-package |
๐ฆ Pip-installable Python library โ pip install plethora for use in your own scripts |
This branch (
main) has the terminal scripts and CLI tool โ clone it and start scraping.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file plethora-1.0.0.tar.gz.
File metadata
- Download URL: plethora-1.0.0.tar.gz
- Upload date:
- Size: 22.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd64a8081025ce6bcd04b2edacd34c1d5219498ab40aa4895525f0b67a5d0168
|
|
| MD5 |
4330399743a355806ca244083b1cfe9e
|
|
| BLAKE2b-256 |
2ef862f1405f473737c8b292a3c2f00c35898bcae160f97bb1e1abd5b85b39ad
|
File details
Details for the file plethora-1.0.0-py3-none-any.whl.
File metadata
- Download URL: plethora-1.0.0-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
671ea53457e8e549be0e9f0bf131d9bab8f7326c77104232c9b7b69d62fdbe8d
|
|
| MD5 |
046c571ffcc1308081c007bf5acacf91
|
|
| BLAKE2b-256 |
f6f363bb37aa4d38f21352fd856558e7d098d2ccf9e166efe6e59ac0af6c51d7
|