Smart Python web scraping library with automatic static and dynamic website detection
Project description
IntelliScrape
IntelliScrape is a smart Python web scraping library that automatically detects whether a website is static or dynamic and extracts text content using the best available method. So you don't have to worry about whether a website is static or dynamic.
Instead of manually choosing between HTTP scraping and browser automation, IntelliScrape handles everything automatically. Just provide a URL and IntelliScrape will retrieve the content.
IntelliScrape is designed for developers and data analysts who want a simple and reliable way to extract data from modern websites without complex configuration.
Installation
Install IntelliScrape: pip install intelliscrape
Install Playwright browsers (required for dynamic sites): python -m playwright install chromium
Quick Start
from intelliscrape import scrape
text = scrape("https://example.com")
print(text)
Why IntelliScrape?
Traditional web scraping requires developers to decide whether a website is static or dynamic and then configure the correct tools manually.
IntelliScrape simplifies this process by automatically selecting the appropriate scraping method.
With IntelliScrape:
No need to detect static vs dynamic websites manually No need to configure Requests or Playwright separately No need to set up Selenium No complex scraping setup Just call one function and get the content.
Features
✔ Automatic static/dynamic detection
✔ Requests-based scraping
✔ Playwright-based rendering
✔ Clean text extraction
✔ Modular architecture
✔ Simple API
✔ Works on modern JavaScript websites
Tested On:
Static: • Wikipedia • Python.org
Dynamic: • Medium • YouTube
How It Works
scrape(url) ↓ Downloader ↓ Static/Dynamic Detection ↓ Parser ↓ Extractor ↓ Cleaner ↓ Return Text
Example Output
from intelliscrape import scrape
text = scrape("https://www.youtube.com/results?search_query=python")
print(text[:500])
HOURS of Python Projects From Beginner to Advanced Python Projects for Beginners Master Problem-Solving! Python Project for Data Analysis- Exploratory Data Analysis Data Analyst Project Learn Python With This ONE Project! Build Python Projects Step-by-Step Python Projects for Beginners to Advanced (Hindi) Mini Project in Python Python for Beginners #project1 python YouTube Skip navigation Search with your voice Subscriptions Unwatched Recently uploaded Search filters lessons Python Language Full
Limitations
IntelliScrape works best on content-based websites. Highly protected platforms and login-required pages may require custom scraping logic. CAPTCHA solving is not automatic. CAPTCHA Solving feature is in development.
Project Structure
intelliscrape/ core.py downloader.py browser.py parser.py extractor.py cleaner.py utils.py exceptions.py
Examples
Example scripts are available in: examples/
Requirements
Python 3.9+
Playwright required for dynamic sites.
Install browsers:
playwright install
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file intelliscrape-1.0.0.tar.gz.
File metadata
- Download URL: intelliscrape-1.0.0.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6cb57531db443821a8df5d99bf0472b084005a888f3a935b8ea4d2c3ad393275
|
|
| MD5 |
bde34494bc1260a10c1277eb42f2b289
|
|
| BLAKE2b-256 |
312ae7d7abe66fd5272003431dc5bb15c7a7a7c71a8d0b29b049fc7bfdebd878
|
File details
Details for the file intelliscrape-1.0.0-py3-none-any.whl.
File metadata
- Download URL: intelliscrape-1.0.0-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fa8353e9db03c8871df75c1c253e201c8686e638cbafcda85f5686d4df1d0cc
|
|
| MD5 |
3bb58158ee78a398aa59c2f0c43d4acb
|
|
| BLAKE2b-256 |
3c5bc6bc5c14e044f748b7c8d0f0db30b087147de538f5cc272258577178b46e
|