A Python library for extracting structured data from web pages using AI.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

SpiderAI

A Python library for extracting structured data from web pages using AI. This library uses Google's Gemini AI to intelligently extract and format data according to your specified schema.

Features

AI-powered content analysis using Google's Gemini AI
Flexible schema definition for structured data extraction
Automatic handling of web page fetching and parsing
Supports both single objects and arrays of objects

Installation

pip install spiderai

Quick Start

First, get your Gemini AI API key from Google AI Studio
Create a .env file in your project root and add your API key:

GEMINI_API_KEY=your_api_key_here

Use the library in your code:

from spiderai import WebDataExtractor
import os
from dotenv import load_dotenv

# Load API key from .env file
load_dotenv()
gemini_api_key = os.getenv("GEMINI_API_KEY")

# Create the extractor
extractor = WebDataExtractor(api_key=gemini_api_key)

# URL to extract data from
url = "https://yoururl.com"

# Define your schema
schema = {
    "key1": "string",
    "key2": "float",
    "key3": "string"
}

# Extract the data
result = extractor.extract(url, schema)

# Use the extracted data
print("Product Name:", result["key1"])
print("Price:", result["key2"])
print("Description:", result["key3"])

Schema Definition

The schema is a dictionary where:

Keys are the field names you want to extract
Values are the expected data types ("string", "float", "integer", "boolean", "number", None)

Example schema:

# Product schema
schema = {
    "name": "string",
    "price": "float",
    "rating": "float",
    "review_count": "integer"
}

# Array of objects
schema = [
    {
        "name": "string",
        "price": "float"
    }
]

Requirements

Python 3.10 or higher
Google Gemini AI API key
Internet connection for web scraping and AI processing

License

This project is licensed under MIT License

Contact

Feel free to contribute to the project by opening issues or suggesting improvements. For any queries, you can reach me at abhinavcv007@gmail.com

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.0

Nov 29, 2024

0.0.3

Nov 28, 2024

0.0.2

Nov 28, 2024

0.0.1

Nov 28, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spiderai-0.1.0.tar.gz (5.4 kB view details)

Uploaded Nov 29, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spiderai-0.1.0-py3-none-any.whl (5.4 kB view details)

Uploaded Nov 29, 2024 Python 3

File details

Details for the file spiderai-0.1.0.tar.gz.

File metadata

Download URL: spiderai-0.1.0.tar.gz
Upload date: Nov 29, 2024
Size: 5.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for spiderai-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`879352762f40c89bfbde486c62aaa4a54ec1afac636c9f73df766ca2f0d72dea`
MD5	`b19927861cb97718c27aa4064039ba45`
BLAKE2b-256	`0abdadc92320ffb9a43dab9fdc6ec67706ce33902719f99516f52424114f48b2`

See more details on using hashes here.

File details

Details for the file spiderai-0.1.0-py3-none-any.whl.

File metadata

Download URL: spiderai-0.1.0-py3-none-any.whl
Upload date: Nov 29, 2024
Size: 5.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for spiderai-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2daeeb80005e11081e84ff651b7f3e8f22f3db583518fb90431da367a8346310`
MD5	`3023f75533f0c8c5f1038c6390982890`
BLAKE2b-256	`cac9aa44c64f2d374f8abad7fdf518f7e57a17d917f3c41480b5ac8c39a23fda`

See more details on using hashes here.

spiderai 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SpiderAI

Features

Installation

Quick Start

Schema Definition

Requirements

License

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes