Skip to main content

Lightweight library for scraping web-sites with LLMs

Project description

Parsera

Lightweight library for scraping web-sites with LLMs. You can test it on Parsera website.

Why Parsera?

Because it's simple and lightweight, with minimal token use it boosts speed and reduces expenses.

Installation

pip install parsera
playwright install

Basic usage

If you want to use OpenAI, remember to set up OPENAI_API_KEY env variable. You can do this from python with:

import os

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY_HERE"

Next you can run a basic version that uses gpt-4o-mini

from parsera import Parsera

url = "https://news.ycombinator.com/"
elements = {
    "Title": "News title",
    "Points": "Number of points",
    "Comments": "Number of comments",
}

scrapper = Parsera()
result = scrapper.run(url=url, elements=elements)

result variable will contain a json with a list of records:

[
   {
      "Title":"Hacking the largest airline and hotel rewards platform (2023)",
      "Points":"104",
      "Comments":"24"
   },
    ...
]

Run with local model

Install Ollama

pip install langchain-ollama
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="llama3",
    temperature=0,
    # other params...
)

url = "https://news.ycombinator.com/"
elements = {
    "Title": "News title",
    "Points": "Number of points",
    "Comments": "Number of comments",
}
scrapper = Parsera(model=llm)
result = scrapper.run(url=url, elements=elements)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsera-0.1.1.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parsera-0.1.1-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file parsera-0.1.1.tar.gz.

File metadata

  • Download URL: parsera-0.1.1.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.4.0

File hashes

Hashes for parsera-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0f01ce03efffcf56e72acf1cdfcfecae882eda7e0293762371f777f700dcaa59
MD5 dc32b63c2ac0481896346f426bfc1383
BLAKE2b-256 a83857b972eeb73f1dd1d19c9d8be50078a282c345e6e8f6549e64d28d266d3d

See more details on using hashes here.

File details

Details for the file parsera-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: parsera-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.4.0

File hashes

Hashes for parsera-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 424cdc2d858bc603e1a897e744a3b5304cbd320f6214aeb9f25cad906fbf7baf
MD5 60585dd49cdd370530f062fa202fb946
BLAKE2b-256 c1681b7204231956b1bd01d76ce9bbaf4c73f005d19d562c285574ce79d898ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page