Skip to main content

Lightweight library for scraping web-sites with LLMs

Project description

📦 Parsera

Website Downloads

Lightweight Python library for scraping websites with LLMs. You can test it on Parsera website.

Why Parsera?

Because it's simple and lightweight, with minimal token use which boosts speed and reduces expenses.

Table of Contents

Installation

pip install parsera
playwright install

Documentation

Check out documentation to learn more about other features, like running custom models and playwright scripts.

Basic usage

If you want to use OpenAI, remember to set up OPENAI_API_KEY env variable. You can do this from python with:

import os

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY_HERE"

Next, you can run a basic version that uses gpt-4o-mini

from parsera import Parsera

url = "https://news.ycombinator.com/"
elements = {
    "Title": "News title",
    "Points": "Number of points",
    "Comments": "Number of comments",
}

scraper = Parsera()
result = scraper.run(url=url, elements=elements)

result variable will contain a json with a list of records:

[
   {
      "Title":"Hacking the largest airline and hotel rewards platform (2023)",
      "Points":"104",
      "Comments":"24"
   },
    ...
]

There is also arun async method available:

result = await scrapper.arun(url=url, elements=elements)

Running with Jupyter Notebook:

Either place this code at the beginning of your notebook:

import nest_asyncio
nest_asyncio.apply()

Or instead of calling run method use async arun.

Running with CLI

Before you run Parsera as command line tool don't forget to put your OPENAI_API_KEY to env variables or .env file

Usage

You can configure elements to parse using JSON string or FILE. Optionally, you can provide FILE to write output.

python -m parsera.main URL {--scheme '{"title":"h1"}' | --file FILENAME} [--output FILENAME]

Running in Docker

In case of issues with your local environment you can run Parsera with Docker, see documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsera-0.1.8.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parsera-0.1.8-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file parsera-0.1.8.tar.gz.

File metadata

  • Download URL: parsera-0.1.8.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.4.0

File hashes

Hashes for parsera-0.1.8.tar.gz
Algorithm Hash digest
SHA256 7c4d2b5f869e7175bfa5facb95c0935f1f8a5054b19a4cd92a66566c387f182f
MD5 7c6cc64ff524d3770c3da35a2ec937a1
BLAKE2b-256 17bb17b3fb2a672c9989b50368c6dbbef36137999d0e964e6483fdb39369dc09

See more details on using hashes here.

File details

Details for the file parsera-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: parsera-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.4.0

File hashes

Hashes for parsera-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 5d011edf4529de2a7cd8931553628338e9c740e9a53e61122fd68550d81f0787
MD5 19ae93a1bc4d631f89bfce0ccb028007
BLAKE2b-256 89cde84d0345dd3942f1947573afbcd70263fb203789f1233f74f6da2e252b1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page