Skip to main content

AI web scraping workflow.

Project description

Scraipe

pypi versions License: MIT

Scraping and analysis framework. Under development.

Features

  • Versatile Scraping: Leverage custom scrapers that handle Telegram messages, news articles, and links that require multiple ingress rules.
  • LLM Analysis: Process text using OpenAI models with built-in Pydantic validation.
  • Workflow Management: Combine scraping and analysis in a single fault-tolerant workflow--ideal for Jupyter notebooks.
  • High Performance: Asynchronous IO-bound tasks are seamlessly integrated in the synchronous API.
  • Modular: Extend the framework with new scrapers or analyzers as your data sources evolve.
  • Customizable Ingress: Easily define rules to dynamically route different links to their appropriate scrapers.
  • Detailed Logging: Monitor scraping and analysis operations through robust errors for improved debugging and transparency.

Check out the demo.

Help

See documentation for details.

Installation

Ensure you are using Python>=3.10. Install Scraipe and all built-in scrapers/analyzers:

pip install scraipe[extended]

Alternatively, install the core library with:

pip install scraipe

Example

 # Import components from scraipe
 from scraipe.defaults import TextScraper
 from scraipe.defaults import TextStatsAnalyzer
 from scraipe import Workflow

 # Initialize the scraper and analyzer
 scraper = TextScraper()
 analyzer = TextStatsAnalyzer()

 # Create the workflow instance
 workflow = Workflow(scraper, analyzer)

 # List urls to scrape
 urls = [
     "https://example.com",
     "https://rickandmortyapi.com/api/character/1",
     "https://ckaestne.github.io/seai/"
 ]

 # Run the workflow
 workflow.scrape(urls)
 workflow.analyze()

 # Print the results
 results = workflow.export()
 print(results)

Contributing

Contributions are welcome. Please open an issue or submit a pull request for improvements.

Run poetry install --with dev,docs --extras extended to install all dependences for the project.

Maintainer

This project is maintained by nibs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scraipe-0.1.72.tar.gz (32.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scraipe-0.1.72-py3-none-any.whl (43.5 kB view details)

Uploaded Python 3

File details

Details for the file scraipe-0.1.72.tar.gz.

File metadata

  • Download URL: scraipe-0.1.72.tar.gz
  • Upload date:
  • Size: 32.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.12.9 Linux/5.15.167.4-microsoft-standard-WSL2

File hashes

Hashes for scraipe-0.1.72.tar.gz
Algorithm Hash digest
SHA256 78d5cd376db701d9afc769c98cd94dda4c2fb77f9804c8186b1f25082d546e38
MD5 c6fefb7bd4b4f35dae47dcb5b6c7d5fb
BLAKE2b-256 77a128bd454ef8763145c50676ad6ad92efb5381d07b328f46167c61210d8dc5

See more details on using hashes here.

File details

Details for the file scraipe-0.1.72-py3-none-any.whl.

File metadata

  • Download URL: scraipe-0.1.72-py3-none-any.whl
  • Upload date:
  • Size: 43.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.12.9 Linux/5.15.167.4-microsoft-standard-WSL2

File hashes

Hashes for scraipe-0.1.72-py3-none-any.whl
Algorithm Hash digest
SHA256 f08697ee27eaeff22ee6d3da1624d92212ff2e32c47f87c31689ce20d6be33e3
MD5 418f518712ddae4d3ef8b1c43d9b1b84
BLAKE2b-256 74fc637369408a7fc13bb88c767a1d5e3e6f695b20dbac469797f4d4eb1d736c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page