Aurelio Platform SDK
Project description
Aurelio SDK
The Aurelio Platform SDK. API references
Installation
To install the Aurelio SDK, use pip or poetry:
pip install aurelio-sdk
Authentication
The SDK requires an API key for authentication. Get key from Aurelio Platform. Set your API key as an environment variable:
export AURELIO_API_KEY=your_api_key_here
Usage
See examples for more details.
Initializing the Client
from aurelio_sdk import AurelioClient
import os
client = AurelioClient(api_key=os.environ["AURELIO_API_KEY"])
or use asynchronous client:
from aurelio_sdk import AsyncAurelioClient
client = AsyncAurelioClient(api_key="your_api_key_here")
Chunk
from aurelio_sdk import ChunkingOptions, ChunkResponse
# All options are optional with default values
chunking_options = ChunkingOptions(
chunker_type="semantic", max_chunk_length=400, window_size=5
)
response: ChunkResponse = client.chunk(
content="Your text here to be chunked", processing_options=chunking_options
)
Extracting Text from Files
PDF Files
from aurelio_sdk import ExtractResponse
# From a local file
file_path = "path/to/your/file.pdf"
with open(file_path, "rb") as file:
response_pdf_file: ExtractResponse = client.extract_file(
file=file, quality="low", chunk=True, wait=-1, enable_polling=True
)
Video Files
from aurelio_sdk import ExtractResponse
# From a local file
file_path = "path/to/your/file.mp4"
with open(file_path, "rb") as file:
response_video_file: ExtractResponse = client.extract_file(
file=file, quality="low", chunk=True, wait=-1, enable_polling=True
)
Extracting Text from URLs
PDF URLs
from aurelio_sdk import ExtractResponse
# From URL
url = "https://arxiv.org/pdf/2408.15291"
response_pdf_url: ExtractResponse = client.extract_url(
url=url, quality="low", chunk=True, wait=-1, enable_polling=True
)
Video URLs
from aurelio_sdk import ExtractResponse
# From URL
url = "https://storage.googleapis.com/gtv-videos-bucket/sample/ForBiggerMeltdowns.mp4"
response_video_url: ExtractResponse = client.extract_url(
url=url, quality="low", chunk=True, wait=-1, enable_polling=True
)
Waiting for completion and checking document atatus
# Set timeout for large files with `high` quality
# Timeout is set to 10 seconds
response_pdf_url: ExtractResponse = client.extract_url(
url="https://arxiv.org/pdf/2408.15291", quality="high", chunk=True, wait=10
)
# Get document status and response
document_response: ExtractResponse = client.get_document(
document_id=response_pdf_file.document.id
)
print("Status:", document_response.status)
# Use a pre-built function, which helps to avoid long hanging requests (Recommended)
document_response = client.wait_for(
document_id=response_pdf_file.document.id, wait=300
)
Embeddings
from aurelio_sdk import EmbeddingResponse
response: EmbeddingResponse = client.embedding(
input="Your text here to be embedded",
model="bm25")
# Or with a list of texts
response: EmbeddingResponse = client.embedding(
input=["Your text here to be embedded", "Your text here to be embedded"]
)
Response Structure
The ExtractResponse
object contains the following key information:
status
: The current status of the extraction taskusage
: Information about token usage, pages processed, and processing timemessage
: Any relevant messages about the extraction processdocument
: The extracted document information, including its IDchunks
: The extracted text, divided into chunks if chunking was enabled
The EmbeddingResponse
object contains the following key information:
message
: Any relevant messages about the embedding processmodel
: The model name used for embeddingusage
: Information about token usage, pages processed, and processing timedata
: The embedded documents
Best Practices
- Use appropriate wait times based on your use case and file sizes.
- Use async client for better performance.
- For large files or when processing might take longer, enable polling for long-hanging requests.
- Always handle potential exceptions and check the status of the response.
- Adjust the
quality
parameter based on your needs. "low" is faster but less accurate, while "high" is slower but more accurate.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
aurelio_sdk-0.0.4.tar.gz
(10.8 kB
view details)
Built Distribution
File details
Details for the file aurelio_sdk-0.0.4.tar.gz
.
File metadata
- Download URL: aurelio_sdk-0.0.4.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.12.6 Linux/6.8.0-1014-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04162dbaa3c1e3067d3bdb041a2fa2a6e0de6aa4614e54669902bf304230b931 |
|
MD5 | 75054f6be353b3b2affebbed71710945 |
|
BLAKE2b-256 | 470c4f72d048e983ebd2e9696d8771de2280fc2109260300e5eb7ad11ef9691e |
File details
Details for the file aurelio_sdk-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: aurelio_sdk-0.0.4-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.12.6 Linux/6.8.0-1014-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4a6e3433510656acf3a1f17b93c514612aceec8630fa9f39b5cf1c0eb0d9366 |
|
MD5 | b63c503b99c568fadd19077125191fc9 |
|
BLAKE2b-256 | 81e7e7171773af15928e1e2bcaa4576ee30e7ddc8d3fc7c3efc85cacf3a2b6a7 |