AI web scraping workflow.

These details have not been verified by PyPI

Project links

Homepage

Project description

Scraipe

Scraipe is a high performance asynchronous scraping and analysis framework that leverages Large Language Models (LLMs) to extract structured information.

Installation

Ensure you have Python 3.10+ installed. Install Scraipe with all built-in scrapers/analyzers:

pip install scraipe[extended]

Alternatively, install the core library and develop your own scrapers/analyzers with:

pip install scraipe

Features

Versatile Scraping: Leverage custom scrapers that handle Telegram messages, news articles, and links that require multiple ingress rules.
LLM Analysis: Process text using OpenAI models with built-in Pydantic validation.
Workflow Management: Combine scraping and analysis in a single fault-tolerant workflow--ideal for Jupyter notebooks.
High Performance: Asynchronous IO-bound tasks are seamlessly integrated in the synchronous API.
Modular: Extend the framework with new scrapers or analyzers as your data sources evolve.
Customizable Ingress: Easily define and update rules to route different types of links to their appropriate scrapers.
Detailed Logging: Monitor scraping and analysis operations through comprehensive logging for improved debugging and transparency.

Usage Example

Setup:

Import the required modules:

from scraipe import Workflow
from scraipe.extended import NewsScraper, OpenAiAnalyzer

Configure Scraper and Analyzer:

# Configure the scraper
scraper = NewsScraper()

# Define an instruction for the analyzer
instruction = '''
Extract a list of celebrities mentioned in the article text.
Return a JSON dictionary with the schema: {"celebrities": ["celebrity1", "celebrity2", ...]}
'''   
analyzer = OpenAiAnalyzer("YOUR_OPENAI_API_KEY", instruction)

Use the Workflow:

workflow = Workflow(scraper, analyzer)

# Provide a list of URLs to scrape
news_links = ["https://example.com/article1", "https://example.com/article2"]
workflow.scrape(news_links)

# Analyze the scraped content
workflow.analyze()

# Export results as a CSV file
export_df = workflow.export()
export_df.to_csv('celebrities.csv', index=False)

Contributing

Contributions are welcome. Please open an issue or submit a pull request for improvements.

License

This project is licensed under the MIT License.

Maintainer

This project is maintained by Nibs

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.73

May 21, 2025

0.1.72

May 21, 2025

0.1.71

May 13, 2025

0.1.70

May 13, 2025

0.1.69

May 13, 2025

0.1.68

May 13, 2025

0.1.67

May 12, 2025

0.1.66

May 10, 2025

0.1.64

Apr 13, 2025

0.1.63

Apr 13, 2025

0.1.62

Apr 13, 2025

0.1.61

Apr 13, 2025

0.1.60

Apr 13, 2025

0.1.59

Apr 13, 2025

0.1.58

Apr 13, 2025

0.1.57

Apr 12, 2025

0.1.56

Apr 12, 2025

0.1.55

Apr 12, 2025

0.1.54

Apr 12, 2025

0.1.53

Apr 12, 2025

0.1.52

Apr 12, 2025

0.1.51

Apr 11, 2025

0.1.50

Apr 11, 2025

0.1.49

Apr 11, 2025

0.1.48

Apr 11, 2025

0.1.47

Apr 11, 2025

0.1.46

Apr 11, 2025

0.1.45

Apr 11, 2025

0.1.44

Apr 11, 2025

0.1.43

Apr 11, 2025

0.1.42

Apr 11, 2025

0.1.41

Apr 9, 2025

0.1.40

Apr 9, 2025

0.1.39

Apr 9, 2025

0.1.37

Apr 8, 2025

0.1.36

Apr 8, 2025

0.1.34

Apr 8, 2025

0.1.33

Apr 8, 2025

0.1.32

Apr 8, 2025

0.1.31

Apr 8, 2025

0.1.30

Apr 8, 2025

0.1.29

Apr 8, 2025

0.1.28

Apr 8, 2025

0.1.27

Apr 6, 2025

This version

0.1.26

Apr 5, 2025

0.1.25

Apr 4, 2025

0.1.24

Apr 4, 2025

0.1.23

Apr 4, 2025

0.1.22

Apr 4, 2025

0.1.21

Apr 4, 2025

0.1.20

Apr 4, 2025

0.1.19

Apr 4, 2025

0.1.18

Apr 4, 2025

0.1.17

Apr 4, 2025

0.1.16

Apr 3, 2025

0.1.15

Apr 2, 2025

0.1.14

Apr 2, 2025

0.1.13

Apr 2, 2025

0.1.12

Mar 28, 2025

0.1.11

Mar 28, 2025

0.1.10

Mar 28, 2025

0.1.9

Mar 28, 2025

0.1.8

Mar 28, 2025

0.1.7

Mar 28, 2025

0.1.6

Mar 28, 2025

0.1.5

Mar 28, 2025

0.1.4

Mar 28, 2025

0.1.3

Mar 28, 2025

0.1.2

Mar 28, 2025

0.1.1

Mar 25, 2025

0.1.0

Mar 25, 2025

0.1.dev6 pre-release

Mar 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scraipe-0.1.26.tar.gz (14.9 kB view details)

Uploaded Apr 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scraipe-0.1.26-py3-none-any.whl (20.2 kB view details)

Uploaded Apr 5, 2025 Python 3

File details

Details for the file scraipe-0.1.26.tar.gz.

File metadata

Download URL: scraipe-0.1.26.tar.gz
Upload date: Apr 5, 2025
Size: 14.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.10.16 Linux/5.15.167.4-microsoft-standard-WSL2

File hashes

Hashes for scraipe-0.1.26.tar.gz
Algorithm	Hash digest
SHA256	`55df849f1b41a390adc2c43b682c98d5d50099a8421ec931796df3d71f4430d1`
MD5	`7078973ae640846fac7442871b56fdad`
BLAKE2b-256	`363e7c56df7b06e836ed77116e7fdceb13ac059fd0e445ec34b525ee2eb726ba`

See more details on using hashes here.

File details

Details for the file scraipe-0.1.26-py3-none-any.whl.

File metadata

Download URL: scraipe-0.1.26-py3-none-any.whl
Upload date: Apr 5, 2025
Size: 20.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.10.16 Linux/5.15.167.4-microsoft-standard-WSL2

File hashes

Hashes for scraipe-0.1.26-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e19e156303dbc0848d75252639109b781eb7601fa40155b4045694da7a169f69`
MD5	`15280465437f5f1f2ef5012e09539abf`
BLAKE2b-256	`c56c6751b2037551a02f9d9c3d8c53d7cceaba7dec5b52a07025faf791fbbc49`

See more details on using hashes here.

scraipe 0.1.26

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Scraipe

Installation

Features

Usage Example

Contributing

License

Maintainer

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes