Python SDK for TABStack AI - Extract, Generate, and Automate web content
Project description
TABStack AI Python SDK
Python SDK for TABStack AI - Extract, Generate, and Automate web content using AI.
Features
- 🔍 Extract: Convert web content to markdown or structured JSON
- ✨ Generate: Transform and enhance web data with AI
- 🤖 Automate: Execute complex web automation tasks using natural language
- ⚡ Async/Await: Modern async Python API for efficient concurrent operations
- 🔄 Connection Pooling: Configurable HTTP connection pooling for optimal performance
- 📘 Fully Typed: Complete type hints for better IDE support and type safety
- 🔒 JSON Schema: Use standard JSON Schema for structured data extraction
- 🛡️ Error Handling: Comprehensive custom exceptions for all API errors
Installation
Using uv (recommended)
uv pip install tabstack
Or add to your project:
uv add tabstack
Using pip
pip install tabstack
Using poetry
poetry add tabstack
Using pipenv
pipenv install tabstack
From Source
git clone https://github.com/Mozilla-Ocho/tabstack-python.git
cd tabstack-python
pip install -e ".[dev]"
Quick Start
import asyncio
import os
from tabstack import TABStack
async def main():
# Initialize the client with connection pooling
async with TABStack(
api_key=os.getenv('TABSTACK_API_KEY'),
max_connections=100,
max_keepalive_connections=20
) as tabs:
# Extract markdown from a URL
result = await tabs.extract.markdown(
url="https://news.ycombinator.com",
metadata=True
)
print(result.content)
print(result.metadata.title)
# Extract structured JSON data
schema = {
"type": "object",
"properties": {
"stories": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"points": {"type": "number"},
"author": {"type": "string"}
}
}
}
}
}
data = await tabs.extract.json(
url="https://news.ycombinator.com",
schema=schema
)
# Generate transformed content with AI
summary_schema = {
"type": "object",
"properties": {
"summaries": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"category": {"type": "string"},
"summary": {"type": "string"}
}
}
}
}
}
# First extract the markdown
markdown_result = await tabs.extract.markdown(url="https://news.ycombinator.com")
# Then transform it with AI
summaries = await tabs.generate.json(
markdown=markdown_result.content,
schema=summary_schema,
instructions="For each story, categorize it and write a one-sentence summary"
)
# Automate web tasks (streaming)
async for event in tabs.automate.execute(
task="Find the top 3 trending repositories and extract their details",
url="https://github.com/trending"
):
if event.type == "task:completed":
print(f"Result: {event.data.final_answer}")
elif event.type == "agent:extracted":
print(f"Extracted: {event.data.extracted_data}")
# Run the async function
asyncio.run(main())
API Reference
All methods are async and should be awaited. The client supports async context manager for automatic connection cleanup.
Client Initialization
from tabstack import TABStack
async with TABStack(
api_key="your-api-key",
base_url="https://api.tabstack.ai/", # optional
max_connections=100, # optional
max_keepalive_connections=20, # optional
keepalive_expiry=30.0, # optional, in seconds
timeout=60.0 # optional, in seconds
) as tabs:
# Your code here
pass
Parameters:
api_key(str, required): Your TABStack API keybase_url(str, optional): API base URL. Default:https://api.tabstack.ai/max_connections(int, optional): Maximum concurrent connections. Default:100max_keepalive_connections(int, optional): Maximum idle connections to keep alive. Default:20keepalive_expiry(float, optional): Seconds to keep idle connections alive. Default:30.0timeout(float, optional): Request timeout in seconds. Default:60.0
Extract Operator
The Extract operator converts web content into structured formats without AI transformation.
extract.markdown(url, metadata=False, nocache=False)
Convert URL content to Markdown format.
Parameters:
url(str): URL to convertmetadata(bool): If True, return metadata as separate field. If False, embed as YAML frontmatter. Default:Falsenocache(bool): Bypass cache and force fresh retrieval. Default:False
Returns: MarkdownResponse with url, content, and optional metadata fields
Example:
result = await tabs.extract.markdown(
url="https://example.com",
metadata=True
)
print(result.content)
print(result.metadata.title)
extract.schema(url, instructions, nocache=False)
Generate a JSON Schema by analyzing the structure of a webpage.
Parameters:
url(str): URL to analyzeinstructions(str): Instructions for what data to extract (max 1000 characters)nocache(bool): Bypass cache. Default:False
Returns: SchemaResponse with generated schema dict
Example:
result = await tabs.extract.schema(
url="https://example.com/products",
instructions="Extract product listings with name, price, and availability"
)
# Use the schema for extraction
data = await tabs.extract.json(url="https://example.com/products", schema=result.schema)
extract.json(url, schema, nocache=False)
Extract structured JSON data from a URL using a schema.
Parameters:
url(str): URL to extract fromschema(dict): JSON Schema defining the structurenocache(bool): Bypass cache. Default:False
Returns: JsonResponse with extracted data
Example:
schema = {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "number"}
}
}
result = await tabs.extract.json(url="https://example.com", schema=schema)
print(result.data)
Generate Operator
The Generate operator uses AI to transform and enhance web content.
generate.json(markdown, instructions, schema)
Transform markdown content into structured JSON using AI.
Parameters:
markdown(str): Markdown content to transforminstructions(str): AI instructions for transformationschema(dict): JSON Schema for output structure
Returns: JsonResponse with generated data
Example:
# First extract markdown
md = await tabs.extract.markdown(url="https://news.ycombinator.com")
# Then transform with AI
schema = {
"type": "object",
"properties": {
"summary": {"type": "string"},
"topics": {"type": "array", "items": {"type": "string"}}
}
}
result = await tabs.generate.json(
markdown=md.content,
instructions="Summarize the content and extract main topics",
schema=schema
)
Automate Operator
The Automate operator executes complex web automation tasks using natural language.
automate.execute(task, url=None, schema=None)
Execute an AI-powered browser automation task (returns async iterator for Server-Sent Events).
Parameters:
task(str): Natural language description of the taskurl(str, optional): Starting URL for the taskschema(dict, optional): JSON Schema for structured data extraction
Yields: AutomateEvent objects with type and data fields
Event Types:
start: Automation startedagent:navigating: Agent is navigating to a URLagent:thinking: Agent is analyzing the pageagent:action: Agent performed an action (click, scroll, etc.)agent:extracted: Agent extracted structured datatask:completed: Task finished successfully
Example:
schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"stars": {"type": "number"}
}
}
}
async for event in tabs.automate.execute(
task="Find trending repositories and extract their names and star counts",
url="https://github.com/trending",
schema=schema
):
if event.type == "agent:extracted":
print(f"Extracted: {event.data.extracted_data}")
elif event.type == "task:completed":
print(f"Final answer: {event.data.final_answer}")
Working with JSON Schemas
TABStack uses standard JSON Schema for defining data structures. Here are common patterns:
Basic Object
schema = {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "number"},
"in_stock": {"type": "boolean"}
}
}
Array of Objects
schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {"type": "number"},
"name": {"type": "string"}
}
}
}
Nested Objects
schema = {
"type": "object",
"properties": {
"product": {
"type": "object",
"properties": {
"name": {"type": "string"},
"details": {
"type": "object",
"properties": {
"weight": {"type": "number"},
"dimensions": {"type": "string"}
}
}
}
}
}
}
Array of Primitives
schema = {
"type": "object",
"properties": {
"tags": {
"type": "array",
"items": {"type": "string"}
}
}
}
For more information on JSON Schema, see json-schema.org.
Error Handling
The SDK provides specific exception classes for different error scenarios:
| Exception | Status Code | Description | Retryable |
|---|---|---|---|
BadRequestError |
400 | Invalid request parameters | No |
UnauthorizedError |
401 | Invalid or missing API key | No |
InvalidURLError |
422 | URL is invalid or inaccessible | No |
ServerError |
500 | Internal server error | Yes (with backoff) |
ServiceUnavailableError |
503 | Service temporarily unavailable | Yes (after delay) |
APIError |
Other | Generic API error | Depends on status |
Example Error Handling
import asyncio
from tabstack import TABStack
from tabstack.exceptions import (
BadRequestError,
UnauthorizedError,
InvalidURLError,
ServerError,
ServiceUnavailableError,
)
async def main():
async with TABStack(api_key="your-api-key") as tabs:
try:
result = await tabs.extract.markdown(url="https://example.com")
except UnauthorizedError:
print("Error: Invalid API key")
except InvalidURLError as e:
print(f"Error: URL is invalid or inaccessible - {e.message}")
except BadRequestError as e:
print(f"Error: Bad request - {e.message}")
except ServerError as e:
print(f"Server error (retryable): {e.message}")
# Implement retry logic with exponential backoff
except ServiceUnavailableError as e:
print(f"Service unavailable (retryable): {e.message}")
# Wait and retry
asyncio.run(main())
Development & Testing
Setup Development Environment
# Clone the repository
git clone https://github.com/Mozilla-Ocho/tabstack-python.git
cd tabstack-python
# Install with development dependencies
pip install -e ".[dev]"
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=tabstack --cov-report=html
# Run specific test file
pytest tests/test_extract.py
# Run with verbose output
pytest -v
Code Quality
# Format code with ruff
ruff format .
# Lint code
ruff check .
# Type checking
mypy tabstack/
Test Structure
tests/
├── conftest.py # Shared pytest fixtures
├── test_client.py # TABStack client tests
├── test_extract.py # Extract operator tests
├── test_generate.py # Generate operator tests
├── test_automate.py # Automate operator tests
├── test_http_client.py # HTTP client tests
├── test_types.py # Response type tests
├── test_exceptions.py # Exception tests
├── test_utils.py # Utility function tests
└── test_integration.py # End-to-end integration tests
All tests use mocked HTTP responses - no real API calls are made during testing.
Contributing
Contributions are welcome! Here's a quick checklist:
- Fork the repository and create a feature branch
- Write tests for new functionality
- Ensure all tests pass (
pytest) - Format code with ruff (
ruff format .) - Ensure linting passes (
ruff check .) - Update documentation as needed
- Submit a pull request with clear description
Requirements
- Python 3.10+ (tested on 3.10, 3.11, 3.12, 3.13, 3.14)
- httpx >= 0.27.0
License
Apache License 2.0 - see LICENSE for details.
Links
- Homepage: https://tabstack.ai
- Documentation: https://docs.tabstack.ai
- PyPI: https://pypi.org/project/tabstack/
- Repository: https://github.com/Mozilla-Ocho/tabstack-python
- Issues: https://github.com/Mozilla-Ocho/tabstack-python/issues
Support
- Email: support@tabstack.ai
- Discord: Join our community
- Documentation: docs.tabstack.ai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tabstack-1.0.3.tar.gz.
File metadata
- Download URL: tabstack-1.0.3.tar.gz
- Upload date:
- Size: 43.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab3caf7315b99d5fa63a244f20a1661364305a57cc25810d32ff4c0c637b4642
|
|
| MD5 |
2cb5a41508a61fd8bc9e7ca9f1b380aa
|
|
| BLAKE2b-256 |
7ef270fdbd46e5f3d15036361b6f8dae41d5e456000a5619d96725fa5a79388b
|
Provenance
The following attestation bundles were made for tabstack-1.0.3.tar.gz:
Publisher:
publish.yml on Mozilla-Ocho/tabstack-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tabstack-1.0.3.tar.gz -
Subject digest:
ab3caf7315b99d5fa63a244f20a1661364305a57cc25810d32ff4c0c637b4642 - Sigstore transparency entry: 698265820
- Sigstore integration time:
-
Permalink:
Mozilla-Ocho/tabstack-python@3af99b1fec9c7cc32a60eac4aee54fe78af3ef7b -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/Mozilla-Ocho
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3af99b1fec9c7cc32a60eac4aee54fe78af3ef7b -
Trigger Event:
release
-
Statement type:
File details
Details for the file tabstack-1.0.3-py3-none-any.whl.
File metadata
- Download URL: tabstack-1.0.3-py3-none-any.whl
- Upload date:
- Size: 36.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
591766952964eeeae0fbcb68ceb650f260210c15670129eba7c38125bfa0259c
|
|
| MD5 |
28d7f187e3e6b8c5bd2dc02453ccd9c0
|
|
| BLAKE2b-256 |
4682f2e1759b0781a33e3c70405fdf4e7fa347e388a65fca16dbcd39523e6946
|
Provenance
The following attestation bundles were made for tabstack-1.0.3-py3-none-any.whl:
Publisher:
publish.yml on Mozilla-Ocho/tabstack-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tabstack-1.0.3-py3-none-any.whl -
Subject digest:
591766952964eeeae0fbcb68ceb650f260210c15670129eba7c38125bfa0259c - Sigstore transparency entry: 698265853
- Sigstore integration time:
-
Permalink:
Mozilla-Ocho/tabstack-python@3af99b1fec9c7cc32a60eac4aee54fe78af3ef7b -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/Mozilla-Ocho
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3af99b1fec9c7cc32a60eac4aee54fe78af3ef7b -
Trigger Event:
release
-
Statement type: