Skip to main content

Smart Python web scraping library with automatic static and dynamic website detection

Project description

IntelliScrape

IntelliScrape is a smart Python web scraping library that automatically detects whether a website is static or dynamic and extracts text content using the best available method. So you don't have to worry about whether a website is static or dynamic.

Instead of manually choosing between HTTP scraping and browser automation, IntelliScrape handles everything automatically. Just provide a URL and IntelliScrape will retrieve the content.

IntelliScrape is designed for developers and data analysts who want a simple and reliable way to extract data from modern websites without complex configuration.

Installation

Install IntelliScrape: pip install intelliscrape

Install Playwright browsers (required for dynamic sites): python -m playwright install chromium

Quick Start

from intelliscrape import scrape

text = scrape("https://example.com")

print(text)

Why IntelliScrape?

Traditional web scraping requires developers to decide whether a website is static or dynamic and then configure the correct tools manually.

IntelliScrape simplifies this process by automatically selecting the appropriate scraping method.

With IntelliScrape:

No need to detect static vs dynamic websites manually No need to configure Requests or Playwright separately No need to set up Selenium No complex scraping setup Just call one function and get the content.

Features

✔ Automatic static/dynamic detection
✔ Requests-based scraping
✔ Playwright-based rendering
✔ Clean text extraction
✔ Modular architecture
✔ Simple API ✔ Works on modern JavaScript websites

Tested On:

Static: • Wikipedia • Python.org

Dynamic: • Medium • YouTube

How It Works

scrape(url) ↓ Downloader ↓ Static/Dynamic Detection ↓ Parser ↓ Extractor ↓ Cleaner ↓ Return Text

Example Output

from intelliscrape import scrape

text = scrape("https://www.youtube.com/results?search_query=python")

print(text[:500])

HOURS of Python Projects From Beginner to Advanced Python Projects for Beginners Master Problem-Solving! Python Project for Data Analysis- Exploratory Data Analysis Data Analyst Project Learn Python With This ONE Project! Build Python Projects Step-by-Step Python Projects for Beginners to Advanced (Hindi) Mini Project in Python Python for Beginners #project1 python YouTube Skip navigation Search with your voice Subscriptions Unwatched Recently uploaded Search filters lessons Python Language Full

Limitations

IntelliScrape works best on content-based websites. Highly protected platforms and login-required pages may require custom scraping logic. CAPTCHA solving is not automatic. CAPTCHA Solving feature is in development.

Project Structure

intelliscrape/ core.py downloader.py browser.py parser.py extractor.py cleaner.py utils.py exceptions.py

Examples

Example scripts are available in: examples/

Requirements

Python 3.9+

Playwright required for dynamic sites.

Install browsers:

playwright install

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intelliscrape-1.0.0.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

intelliscrape-1.0.0-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file intelliscrape-1.0.0.tar.gz.

File metadata

  • Download URL: intelliscrape-1.0.0.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for intelliscrape-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6cb57531db443821a8df5d99bf0472b084005a888f3a935b8ea4d2c3ad393275
MD5 bde34494bc1260a10c1277eb42f2b289
BLAKE2b-256 312ae7d7abe66fd5272003431dc5bb15c7a7a7c71a8d0b29b049fc7bfdebd878

See more details on using hashes here.

File details

Details for the file intelliscrape-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: intelliscrape-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for intelliscrape-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4fa8353e9db03c8871df75c1c253e201c8686e638cbafcda85f5686d4df1d0cc
MD5 3bb58158ee78a398aa59c2f0c43d4acb
BLAKE2b-256 3c5bc6bc5c14e044f748b7c8d0f0db30b087147de538f5cc272258577178b46e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page