Search the web, scrape sites, and generate reports โ all from your terminal.
Project description
๐ฆ Plethora
Search the web, scrape sites, and generate reports โ from Python.
Install with pip and use anywhere: scripts, notebooks, Google Colab, or the command line.
๐ Installation
pip install plethora
Works on Linux, macOS, Windows, Termux, and Google Colab.
๐ Quick Start
Python Library
from plethora import web_search, scrape_page, scrape_subpages, run
# Search the web
results = web_search("python tutorials", num_results=10)
# Scrape a single page
page = scrape_page("https://example.com")
print(page["title"], page["headings"], page["lists"], page["tables"])
# Full pipeline โ search, scrape, and generate reports
paths = run("AI news 2026", level="high", num_results=5, out_format="all")
Google Colab
!pip install plethora
from plethora import run
# Generate a markdown report right in your notebook
paths = run("machine learning trends", level="medium", out_format="md")
Command Line
# Basic usage
plethora "your search query" --level medium
# All formats at once
plethora "AI research" --level high --format all
# Parallel scrape with 8 threads
plethora "web dev trends" --level medium --workers 8 --no-cache
# Quiet mode for piping
plethora "data science" --level low --quiet --format json
CLI Options
plethora <query> [options]
-l, --level LEVEL low | medium | high (default: medium)
-n, --results N Number of search results (default: 5)
-s, --subpages N Max sub-pages per site (high only) (default: 2)
-o, --output DIR Output directory (default: reports/)
-f, --format FMT txt | md | html | json | pdf | all (default: txt)
-w, --workers N Concurrent scraping threads (default: 4)
-q, --quiet Suppress progress output
--no-cache Bypass URL cache
--cache-ttl SECS Cache TTL in seconds (default: 3600)
๐ Scrape Levels
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Level โ What You Get โ
โโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๐ข LOW โ Search results list โ titles, URLs, snippets โ
โ โ โก Instant โ doesn't visit any pages โ
โโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๐ก MED โ Visits each result page โ pulls headings, meta, โ
โ โ lists, and a content preview (500 chars) โ
โโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๐ด HIGH โ Deep scrape โ full page content + follows links โ
โ โ to sub-pages. Tables, images, 2000 char content โ
โโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Output Formats
| Format | Extension | Description |
|---|---|---|
| txt | .txt |
Clean plain text โ great for terminal reading |
| md | .md |
Markdown โ perfect for pasting into notes or docs |
| html | .html |
Self-contained HTML with dark theme |
| json | .json |
Raw structured data โ feed it into your own scripts |
.pdf |
Portable PDF with watermark |
Use --format all or out_format="all" to generate everything at once.
โจ Features
- Concurrent scraping โ pages are fetched in parallel with configurable threads
- Smart caching โ already-fetched URLs are cached locally (1hr default TTL)
- robots.txt respect โ checks before scraping, skips disallowed URLs
- Auto-retries โ failed requests retry 3x with exponential backoff
- Per-domain rate limiting โ won't hammer the same site
- Rich extraction โ headings (h1โh6), paragraphs, lists, tables, image metadata
- Progress bars โ live Rich progress when scraping (install with
pip install plethora[rich])
๐ฆ Dependencies
Required:
requestsโ HTTP clientbeautifulsoup4โ HTML parsingfpdf2โ PDF generation
Optional:
richโ progress bars (pip install plethora[rich])
โ ๏ธ Disclaimer
This tool is for personal research and educational purposes only.
It respects robots.txt, includes per-domain rate limiting, and plays nice
with servers. Please don't abuse it. Use responsibly.
๐ฐ Support This Project
If you find this useful, consider supporting me โ it keeps me building stuff like this.
Built by @soumyadipkarforma ยท MIT License
๐ฟ Other Branches
| Branch | What's There |
|---|---|
main |
๐ Terminal scripts & CLI tool โ clone and start scraping |
website |
๐ React web app โ try it live |
This branch (
pypi-package) has the pip-installable Python package.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file plethora-2.0.0.tar.gz.
File metadata
- Download URL: plethora-2.0.0.tar.gz
- Upload date:
- Size: 18.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36e5ae845484fcf44d8a7927c37cef9b1d811b7a1143c5bbd30a13f00b79398c
|
|
| MD5 |
674b9b2b2ec10a759d7f2551262f1ff9
|
|
| BLAKE2b-256 |
f6de5fd15c4b016d585f7fee97afcf88af8673281593af27b83f7d4a04133776
|
File details
Details for the file plethora-2.0.0-py3-none-any.whl.
File metadata
- Download URL: plethora-2.0.0-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fb4e0edf123ee85acc69d22554536aadbdb848a154275b10ee5a7a63e0784cb
|
|
| MD5 |
d9cab005d754f4fc90194b04edf46072
|
|
| BLAKE2b-256 |
1a5f2155d6e094c93e329f5f85ff340db2ec54200195d268926852f0a5382ab5
|