AI web scraping workflow.
Project description
Scraipe
Scraping and analysis framework. Under development.
Features
- Versatile Scraping: Leverage custom scrapers that handle Telegram messages, news articles, and links that require multiple ingress rules.
- LLM Analysis: Process text using OpenAI models with built-in Pydantic validation.
- Workflow Management: Combine scraping and analysis in a single fault-tolerant workflow--ideal for Jupyter notebooks.
- High Performance: Asynchronous IO-bound tasks are seamlessly integrated in the synchronous API.
- Modular: Extend the framework with new scrapers or analyzers as your data sources evolve.
- Customizable Ingress: Easily define rules to dynamically route different links to their appropriate scrapers.
- Detailed Logging: Monitor scraping and analysis operations through robust errors for improved debugging and transparency.
Help
See documentation for details.
Installation
Ensure you are using Python>=3.10. Install Scraipe and all built-in scrapers/analyzers:
pip install scraipe[extended]
Alternatively, install the core library with:
pip install scraipe
Example
# Import components from scraipe
from scraipe.defaults import TextScraper
from scraipe.defaults import TextStatsAnalyzer
from scraipe import Workflow
# Initialize the scraper and analyzer
scraper = TextScraper()
analyzer = TextStatsAnalyzer()
# Create the workflow instance
workflow = Workflow(scraper, analyzer)
# List urls to scrape
urls = [
"https://example.com",
"https://rickandmortyapi.com/api/character/1",
"https://ckaestne.github.io/seai/"
]
# Run the workflow
workflow.scrape(urls)
workflow.analyze()
# Print the results
results = workflow.export()
print(results)
Contributing
Contributions are welcome. Please open an issue or submit a pull request for improvements.
Run poetry install --with dev,docs --extras extended to install all dependences for the project.
Maintainer
This project is maintained by nibs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scraipe-0.1.61.tar.gz.
File metadata
- Download URL: scraipe-0.1.61.tar.gz
- Upload date:
- Size: 28.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.12.9 Linux/5.15.167.4-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9c8e69f4e00bb444197405b6b5ea6ee3962b0c9212041d822fe8d9e00edb6bf
|
|
| MD5 |
fcc988cd748101c4fcb05555f48e79cd
|
|
| BLAKE2b-256 |
b37d7b0009a6bc096228ac0596e7b0a8c80baac21a8d45feaef4a766dd18e9c0
|
File details
Details for the file scraipe-0.1.61-py3-none-any.whl.
File metadata
- Download URL: scraipe-0.1.61-py3-none-any.whl
- Upload date:
- Size: 38.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.12.9 Linux/5.15.167.4-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51830b7dce4c9c47ac938218f0174ef8c56c7678e5b2f697b411a7687160e5e2
|
|
| MD5 |
d32ca4acd1db5c11afe150f6a9a563be
|
|
| BLAKE2b-256 |
c636eb3cb303456113e9bf92798fae2cab9135723bf696521a825b576bd1e4f9
|