Skip to main content

Python SDK for Splore

Project description

Splore Python SDK

The Splore Python SDK simplifies the process of interacting with the Splore document processing platform. Use it to upload files, process documents, and retrieve extracted data with minimal setup.


📌 Table of Contents


🚀 Features

  • Agent Management: Create, update, retrieve, and delete agents.
  • File Upload: Upload documents for processing.
  • Extractions: Extract structured data from documents.
  • Search: Perform web searches and retrieve search history.
  • AWS S3 Integration: Process files directly from S3.
  • Task Monitoring: Track the progress of extraction jobs.
  • Error Handling: Provides meaningful errors and retry mechanisms.
  • Python 3.7+ Compatibility: tested supported version after 3.7.17 can be used for python 3.7 and above.

📥 Installation

Install the SDK via pip:

pip install splore-sdk

For optional example dependencies:

pip install splore-sdk[examples]

🏁 Getting Started

Prerequisites

  1. API Key and Base ID: Obtain these from the Splore console.
  2. Python 3.7+: Ensure Python is installed.

Quick Start Example

from splore_sdk import SploreSDK

# Initialize SDK
sdk = SploreSDK(api_key="YOUR_API_KEY", base_id="YOUR_BASE_ID")

# Initialize Agent for extraction
extraction_agent = sdk.init_agent(agent_id="YOUR_AGENT_ID")

# Basic extraction flow
extracted_response = extraction_agent.extract(file_path="absolute_file_path")
print(extracted_response)

📦 Modules Overview

🔹 Agent Management

Manage agents for document processing.

Example Usage

from splore_sdk import SploreSDK

# Initialize SDK
sdk = SploreSDK(api_key="YOUR_API_KEY", base_id="YOUR_BASE_ID")

# Create an agent
agent_payload = {"name": "Test Agent", "config": {"key": "value"}}
create_response = sdk.create_agent(agent_payload)
print("Create Agent Response:", create_response)

# Get agent details
agent_id = create_response.get("id")
get_response = sdk.agents.get_agents(agentId=agent_id)
print("Get Agent Response:", get_response)

# Get all agents
all_agents = sdk.agents.get_agents()
print("All Agents:", all_agents)

# Update the agent
update_payload = {"name": "Updated Agent Name"}
update_response = sdk.agents.update_agent(agent_payload=update_payload)
print("Update Agent Response:", update_response)

# Delete the agent
delete_response = sdk.agents.delete_agents(agentId=agent_id)
print("Delete Agent Response:", delete_response)

🔹 Extractions

Handle document processing and extraction.

Example Usage

from splore_sdk import SploreSDK
from time import sleep

# Initialize SDK
sdk = SploreSDK(api_key="YOUR_API_KEY", base_id="YOUR_BASE_ID")

# Get all agents
agents = sdk.agents.get_agents()
agent_id = agents[0]["id"]  # Adjust as needed

# Initialize agent
extraction_agent = sdk.init_agent(agent_id=agent_id)

# Upload file
upload_response = extraction_agent.file_uploader.upload_file(file_path="path/to/file.pdf")
file_id = upload_response
print("File uploaded with ID:", file_id)
# monitor indexing
while True:
    extraction_resp = extraction_agent.service.processing_status(file_id=upload_res)
    file_processing_status = extraction_resp.get("fileProcessingStatus")
    file_indexed = file_processing_status == "INDEXED"
    if file_indexed:
        break
    extraction_agent.logger.info("File indexing not completed, waiting...")
    sleep(10)
# Start extraction
extraction_agent.extractions.start(file_id=file_id)

# Monitor extraction status
while True:
    status = extraction_agent.extractions.processing_status(file_id=file_id)
    if status.get("fileProcessingStatus") == "COMPLETED":
        break
    sleep(10)  # Wait before checking again

# Retrieve extracted data
extracted_data = extraction_agent.extractions.extracted_response(file_id=file_id)
print("Extracted Data:", extracted_data)

🔹 Search

Beta Feature - The search API is currently in beta and its signature may change in future releases.

Perform web searches and manage search history.

Example Usage

from splore_sdk import SploreSDK

# Initialize SDK
sdk = SploreSDK(api_key="YOUR_API_KEY", base_id="YOUR_BASE_ID")

# Initialize agent
agent_id = "YOUR_AGENT_ID"
search_agent = sdk.init_agent(agent_id=agent_id)

# Perform a search
search_results = search_agent.search.search(query="artificial intelligence", count=5, engine="google")
print("Search Results:", search_results)

# Get search history
history = search_agent.search.get_history(page=0, size=10)
print("Search History:", history)

🔹 File Upload

Upload files to Splore for processing.

Example Usage

from splore_sdk import SploreSDK

# Initialize SDK
sdk = SploreSDK(api_key="YOUR_API_KEY", base_id="YOUR_BASE_ID")

# Upload file with metadata
metadata = {
    "file_name": "document.pdf",
    "custom_extraction": "false",
    "is_data_file": "true"
}

with open("path/to/file.pdf", "rb") as file:
    response = sdk.file_uploader.upload_file(file_stream=file, metadata=metadata)
    print("Upload Response:", response)

🔹 AWS Integration

Download files from AWS S3 for extraction.

Example Usage

from splore_sdk import SploreSDK
from examples.aws import download_from_s3

# Initialize SDK
sdk = SploreSDK(api_key="YOUR_API_KEY", base_id="YOUR_BASE_ID")

# Initialize extraction agent
extraction_agent = sdk.init_agent(agent_id="YOUR_AGENT_ID")

# Create a temporary file destination
file_ref = sdk.file_uploader.create_temp_file_destination(file_extension=".pdf")
s3_uri = "s3://abc/def/abc.pdf"

# Download file from S3
download_from_s3(s3_uri, file_ref)

# Start extraction
response = extraction_agent.extract(file_path=file_ref)
print("Extraction Response:", response)

🔹 Utility Functions

Helper functions to simplify common tasks.

Markdown to HTML Conversion

Convert markdown content to HTML using the md_to_html utility function.

from splore_sdk.utils import md_to_html

# Basic usage
html = md_to_html("# Hello World")
print(html)  # Output: <h1>Hello World</h1>

# Convert markdown with multiple features
markdown_text = """
# Title

This is a **bold** text with *italic* formatting.

1. Ordered list item
2. Another item

> Blockquote example
"""

html = md_to_html(markdown_text)
print(html)

Advanced Usage with MarkdownConverter

For more control over the conversion process, use the MarkdownConverter class.

from splore_sdk.utils import MarkdownConverter

# Create a converter instance
converter = MarkdownConverter()

# Convert with specific extensions
html = converter.convert(
    markdown_text,
    extensions=['extra', 'codehilite', 'toc'],
    extension_configs={
        'codehilite': {
            'linenums': True,
            'css_class': 'highlight'
        }
    },
    safe_mode=True
)

# Save to file
with open("output.html", "w") as f:
    f.write(html)

Formatting Extracted Responses

Use markdown to format extracted data into well-structured documents:

# Initialize SDK with API key, base_id from splore console
sdk = SploreSDK(api_key="YOUR_API_KEY", base_id="YOUR_BASE_ID")

# Initialize Agent for extraction
extraction_agent = sdk.init_agent(agent_id="YOUR_AGENT_ID")

# Get extracted response
extracted_data = extraction_agent.extract(file_path="absolute_file_path")

# Convert extracted responses to HTML
markdown_texts = map(lambda x: md_to_html(x["response"]), extracted_data)
print("Extracted Data:", markdown_texts)

⚙️ Advanced Usage

🔸 Polling Interval Configuration

Customize the polling interval for extraction status checks.

while True:
    status = sdk.extractions.processing_status(file_id=file_id)
    if status.get("fileProcessingStatus") == "COMPLETED":
        break
    sleep(5)  # Set custom polling interval

🔸 Error Handling

Handle errors gracefully for better debugging.

try:
    sdk.extractions.upload_file("path/to/file.pdf")
except Exception as e:
    print("Error uploading file:", str(e))

🔸 Python 3.7 Compatibility

The SDK now supports Python 3.7 and above.


❓ FAQ

1️⃣ How do I get an API Key?

Sign up on the Splore console and navigate to the API section to generate a key.

2️⃣ Can I use this SDK asynchronously?

Asynchronous support will be added in a future release.

3️⃣ Which file formats are supported?

Currently, only PDF files are supported.

4️⃣ How do I handle search functionality?

The SDK provides a dedicated search capability that allows you to perform web searches and manage search history. Use the search.search() method to perform searches and search.get_history() to retrieve search history.

5️⃣ How do I check the SDK version?

from splore_sdk import __version__
print("Splore SDK Version:", __version__)

🔗 Support

For any questions or issues, please:


📜 License

This SDK is licensed under the MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

splore_sdk-0.1.22.tar.gz (28.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

splore_sdk-0.1.22-py3-none-any.whl (34.1 kB view details)

Uploaded Python 3

File details

Details for the file splore_sdk-0.1.22.tar.gz.

File metadata

  • Download URL: splore_sdk-0.1.22.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for splore_sdk-0.1.22.tar.gz
Algorithm Hash digest
SHA256 68089148f0dd94eadf7f33d309851aa9b5a97d6db17b89bb19fce7fe05e8d232
MD5 80d906a6f270c7c05f9007c84b1ccc6a
BLAKE2b-256 2578b1d1302e84296674c7446cbe5165af9eb001896a4a4d976710de38f5c79f

See more details on using hashes here.

File details

Details for the file splore_sdk-0.1.22-py3-none-any.whl.

File metadata

  • Download URL: splore_sdk-0.1.22-py3-none-any.whl
  • Upload date:
  • Size: 34.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for splore_sdk-0.1.22-py3-none-any.whl
Algorithm Hash digest
SHA256 f0dd13001c893c3eec7064f1f78497edd778906c0b5f07a652b30d75c576c480
MD5 7130309ce973827c20890e5e4df9eeab
BLAKE2b-256 79254573f500f2e4b2bd6bb2a115df8bc2feab022cd93ee4a622f26ce7aef60c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page