Tensorlake SDK for Document Ingestion API and Serverless Applications
Project description
Get high quality data from Documents fast, and deploy scalable serverless Data Processor APIs
Tensorlake is the platform for agentic applications. Build and deploy high throughput, durable, agentic applications and workflows in minutes, leveraging our best-in-class Document Ingestion API and compute platform for applications.
Features
-
Document Ingestion - Parse documents (PDFs, DOCX, spreadsheets, presentations, images, and raw text) to markdown or extract structured data with schemas. This is powered by Tensorlake's state of the art layout detection and table recognition models. Review our benchmarks here.
-
Agentic Applications - Deploy Agentic Applications and AI Workflows using durable functions, with sandboxed and managed compute infrastructure that scales your agents with usage.
Document Ingestion Quickstart
Installation
Install the SDK and get an API Key.
pip install tensorlake
Sign up at cloud.tensorlake.ai and get your API key.
Parse Documents
from tensorlake.documentai import DocumentAI, ParseStatus
doc_ai = DocumentAI(api_key="your-api-key")
# Upload and parse document
file_id = doc_ai.upload("/path/to/document.pdf")
# Get parse ID
parse_id = doc_ai.parse(file_id)
# Wait for completion and get results
result = doc_ai.result(parse_id)
if result.status == ParseStatus.SUCCESSFUL:
for chunk in result.chunks:
print(chunk.content) # Clean markdown output
Customize Parsing
Configure chunking, table output, figure summarization, and more. See all options.
from tensorlake.documentai import DocumentAI, ParsingOptions, EnrichmentOptions, ChunkingStrategy, TableOutputMode
doc_ai = DocumentAI(api_key="your-api-key")
file_id = doc_ai.upload("/path/to/document.pdf")
result = doc_ai.parse_and_wait(
file_id,
parsing_options=ParsingOptions(
chunking_strategy=ChunkingStrategy.SECTION,
table_output_mode=TableOutputMode.HTML,
signature_detection=True
),
enrichment_options=EnrichmentOptions(
figure_summarization=True,
table_summarization=True
)
)
Structured Extraction
Extract specific data fields using Pydantic models or JSON schemas. See docs.
from tensorlake.documentai import DocumentAI, StructuredExtractionOptions
from pydantic import BaseModel, Field
class InvoiceData(BaseModel):
invoice_number: str = Field(description="Invoice number")
total_amount: float = Field(description="Total amount due")
due_date: str = Field(description="Payment due date")
vendor_name: str = Field(description="Vendor company name")
doc_ai = DocumentAI(api_key="your-api-key")
result = doc_ai.parse_and_wait(
"https://example.com/invoice.pdf", # Or use file_id from upload()
structured_extraction_options=[StructuredExtractionOptions(
schema_name="Invoice Data",
json_schema=InvoiceData
)]
)
print(result.structured_data)
Learn More
Build Durable Agentic Applications in Python
Deploy agentic applications on a distributed runtime with automatic scaling and durable execution — applications restart from where they crashed automatically. You can build with any Python framework. Agents are exposed as HTTP APIs like web applications.
- No Queues: We manage state and orchestration
- Zero Infra: Write Python, deploy to Tensorlake
- Progress Updates: Applications can run for any amount of time and stream updates to users.
Quickstart
Decorate your entrypoint with @application() and functions with @function() for checkpointing and sandboxed execution. Each function runs in its own isolated sandbox.
Example: City guide using OpenAI Agents with web search and code execution:
from agents import Agent, Runner
from agents.tool import WebSearchTool, function_tool
from tensorlake.applications import application, function, Image
# Define the image with necessary dependencies
FUNCTION_CONTAINER_IMAGE = Image(base_image="python:3.11-slim", name="city_guide_image").run(
"pip install openai openai-agents"
)
@function_tool
@function(
description="Gets the weather for a city using an OpenAI Agent with web search",
secrets=["OPENAI_API_KEY"],
image=FUNCTION_CONTAINER_IMAGE,
)
def get_weather_tool(city: str) -> str:
"""Uses an OpenAI Agent with WebSearchTool to find current weather."""
agent = Agent(
name="Weather Reporter",
instructions="Use web search to find current weather in Fahrenheit for the city.",
tools=[WebSearchTool()], # Agent can search the web
)
result = Runner.run_sync(agent, f"City: {city}")
return result.final_output.strip()
@application(tags={"type": "example", "use_case": "city_guide"})
@function(
description="Creates a guide with temperature conversion using function_tool",
secrets=["OPENAI_API_KEY"],
image=FUNCTION_CONTAINER_IMAGE,
)
def city_guide_app(city: str) -> str:
"""Uses an OpenAI Agent with function_tool to run Python code for conversion."""
@function_tool
def convert_to_celsius_tool(python_code: str) -> float:
"""Converts Fahrenheit to Celsius - runs as Python code via Agent."""
return float(eval(python_code))
agent = Agent(
name="Guide Creator",
instructions="Using the appropriate tools, get the weather for the purposes of the guide. If the city uses Celsius, call convert_to_celsius_tool to convert the temperature, passing in the code needed to convert the temperature to Celsius. Create a friendly guide that references the temperature of the city in Celsius if the city typically uses Celsius, otherwise reference the temperature in Fahrenheit. Only reference Celsius or Farenheit, not both.",
tools=[get_weather_tool, convert_to_celsius_tool], # Agent can execute this Python function
)
result = Runner.run_sync(agent, f"City: {city}")
return result.final_output.strip()
Note: This is a simplified version. See the complete example at examples/readme_example/city_guide.py for the full implementation including activity suggestions and agent orchestration.
Deploy to Tensorlake Cloud
- Set your API keys:
export TENSORLAKE_API_KEY="your-api-key"
tensorlake secrets set OPENAI_API_KEY "your-openai-key"
- Deploy:
tensorlake deploy examples/readme_example/city_guide.py
Call via HTTP
# Invoke the application
curl https://api.tensorlake.ai/applications/city_guide_app \
-H "Authorization: Bearer $TENSORLAKE_API_KEY" \
--json '"San Francisco"'
# Returns: {"request_id": "beae8736ece31ef9"}
# Get the result
curl https://api.tensorlake.ai/applications/city_guide_app/requests/{request_id}/output \
-H "Authorization: Bearer $TENSORLAKE_API_KEY"
# Stream results with SSE
curl https://api.tensorlake.ai/applications/city_guide_app \
-H "Authorization: Bearer $TENSORLAKE_API_KEY" \
-H "Accept: text/event-stream" \
--json '"San Francisco"'
# Send files
curl https://api.tensorlake.ai/applications/my_pdf_processor \
-H "Authorization: Bearer $TENSORLAKE_API_KEY" \
-H "Content-Type: application/pdf" \
--data-binary @document.pdf
Learn More
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tensorlake-0.3.9.tar.gz.
File metadata
- Download URL: tensorlake-0.3.9.tar.gz
- Upload date:
- Size: 2.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02cb877f25cd7c58f6b147d27e4ca91c64f4532bade4c041faa99ec5b81da17b
|
|
| MD5 |
5b1b82af1e45627a377457f9632b7bcd
|
|
| BLAKE2b-256 |
d28a385a509690743df65a829795de6e0f967159b6096dfe6a6abe17459c4485
|
Provenance
The following attestation bundles were made for tensorlake-0.3.9.tar.gz:
Publisher:
publish_pypi.yaml on tensorlakeai/tensorlake
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tensorlake-0.3.9.tar.gz -
Subject digest:
02cb877f25cd7c58f6b147d27e4ca91c64f4532bade4c041faa99ec5b81da17b - Sigstore transparency entry: 930417854
- Sigstore integration time:
-
Permalink:
tensorlakeai/tensorlake@31988c910fedc54e6d3636bc745691431a2b3f38 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tensorlakeai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yaml@31988c910fedc54e6d3636bc745691431a2b3f38 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file tensorlake-0.3.9-py3-none-any.whl.
File metadata
- Download URL: tensorlake-0.3.9-py3-none-any.whl
- Upload date:
- Size: 2.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
564c0271959dd1e225932ea67e64373e831cf270f720c3375c6be2b04bf325b2
|
|
| MD5 |
209f4fa202b9ba46ccc5ba32c412cc33
|
|
| BLAKE2b-256 |
6821742f781eec7f4caf73d9f7118817c5a324b8a944cce6f421d90b03aaa4ad
|
Provenance
The following attestation bundles were made for tensorlake-0.3.9-py3-none-any.whl:
Publisher:
publish_pypi.yaml on tensorlakeai/tensorlake
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tensorlake-0.3.9-py3-none-any.whl -
Subject digest:
564c0271959dd1e225932ea67e64373e831cf270f720c3375c6be2b04bf325b2 - Sigstore transparency entry: 930417860
- Sigstore integration time:
-
Permalink:
tensorlakeai/tensorlake@31988c910fedc54e6d3636bc745691431a2b3f38 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tensorlakeai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yaml@31988c910fedc54e6d3636bc745691431a2b3f38 -
Trigger Event:
workflow_dispatch
-
Statement type: