Skip to main content

Lightweight library for scraping web-sites with LLMs

Project description

Parsera

Lightweight library for scraping web-sites with LLMs. You can check how it works on Parsera website.

Why Parsera?

Because it's simple and lightweight, with minimal token use it boosts speed and reduces expenses.

Installation

pip install parsera
playwright install

Basic usage

If you want to use OpenAI, remember to set up OPENAI_API_KEY env variable. You can do this from python with:

import os

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY_HERE"

Next you can run a basic version that uses gpt-4o-mini

from parsera import Parsera

url = "https://news.ycombinator.com/"
elements = {
    "Title": "News title",
    "Points": "Number of points",
    "Comments": "Number of comments",
}

scrapper = Parsera()
result = scrapper.run(url=url, elements=elements)

result variable will contain a json with a list of records:

[
   {
      "Title":"Hacking the largest airline and hotel rewards platform (2023)",
      "Points":"104",
      "Comments":"24"
   },
    ...
]

Run with local model

Install Ollama

pip install langchain-ollama
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="llama3",
    temperature=0,
    # other params...
)

url = "https://news.ycombinator.com/"
elements = {
    "Title": "News title",
    "Points": "Number of points",
    "Comments": "Number of comments",
}
scrapper = Parsera(model=llm)
result = scrapper.run(url=url, elements=elements)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsera-0.1.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parsera-0.1.0-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file parsera-0.1.0.tar.gz.

File metadata

  • Download URL: parsera-0.1.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.4.0

File hashes

Hashes for parsera-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ee8f4077a3176de5101fb0f8c0ef6101d0c85ca02a9e78c7ae96f2f585ad4aaa
MD5 d998579ef4495bb79890ca5f95185fa0
BLAKE2b-256 7ccc03d80d961345565663ed498f63b8fcf62a7f041b67aca0bead20e993e305

See more details on using hashes here.

File details

Details for the file parsera-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: parsera-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.4.0

File hashes

Hashes for parsera-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6bb007170027bd768829d40f94971ab8610a77b516f265ab8829e1ed344c0281
MD5 71d5264c23d22e95c0a24dd8ae680d7f
BLAKE2b-256 e69e0d21e0b532f8b0df682f624867900d420586b6bba5353ec580fb6309280a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page