Lightweight library for scraping web-sites with LLMs
Project description
Parsera
Lightweight library for scraping web-sites with LLMs. You can check how it works on Parsera website.
Why Parsera?
Because it's simple and lightweight, with minimal token use it boosts speed and reduces expenses.
Installation
pip install parsera
playwright install
Basic usage
If you want to use OpenAI, remember to set up OPENAI_API_KEY env variable.
You can do this from python with:
import os
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY_HERE"
Next you can run a basic version that uses gpt-4o-mini
from parsera import Parsera
url = "https://news.ycombinator.com/"
elements = {
"Title": "News title",
"Points": "Number of points",
"Comments": "Number of comments",
}
scrapper = Parsera()
result = scrapper.run(url=url, elements=elements)
result variable will contain a json with a list of records:
[
{
"Title":"Hacking the largest airline and hotel rewards platform (2023)",
"Points":"104",
"Comments":"24"
},
...
]
Run with local model
Install Ollama
pip install langchain-ollama
from langchain_ollama import ChatOllama
llm = ChatOllama(
model="llama3",
temperature=0,
# other params...
)
url = "https://news.ycombinator.com/"
elements = {
"Title": "News title",
"Points": "Number of points",
"Comments": "Number of comments",
}
scrapper = Parsera(model=llm)
result = scrapper.run(url=url, elements=elements)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parsera-0.1.0.tar.gz.
File metadata
- Download URL: parsera-0.1.0.tar.gz
- Upload date:
- Size: 10.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee8f4077a3176de5101fb0f8c0ef6101d0c85ca02a9e78c7ae96f2f585ad4aaa
|
|
| MD5 |
d998579ef4495bb79890ca5f95185fa0
|
|
| BLAKE2b-256 |
7ccc03d80d961345565663ed498f63b8fcf62a7f041b67aca0bead20e993e305
|
File details
Details for the file parsera-0.1.0-py3-none-any.whl.
File metadata
- Download URL: parsera-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6bb007170027bd768829d40f94971ab8610a77b516f265ab8829e1ed344c0281
|
|
| MD5 |
71d5264c23d22e95c0a24dd8ae680d7f
|
|
| BLAKE2b-256 |
e69e0d21e0b532f8b0df682f624867900d420586b6bba5353ec580fb6309280a
|