Python SDK for TABStack AI - Extract, Generate, and Automate web content

These details have not been verified by PyPI

Project links

Project description

TABStack AI Python SDK

Python SDK for TABStack AI - Extract, Generate, and Automate web content using AI.

Features

🔍 Extract: Convert web content to markdown or structured JSON
✨ Generate: Transform and enhance web data with AI
🤖 Automate: Execute complex web automation tasks using natural language
⚡ Async/Await: Modern async Python API for efficient concurrent operations
🔄 Connection Pooling: Configurable HTTP connection pooling for optimal performance
📘 Fully Typed: Complete type hints for better IDE support and type safety
🔒 JSON Schema: Use standard JSON Schema for structured data extraction
🛡️ Error Handling: Comprehensive custom exceptions for all API errors

Installation

Using uv (recommended)

uv pip install tabstack

Or add to your project:

uv add tabstack

Using pip

pip install tabstack

Using poetry

poetry add tabstack

Using pipenv

pipenv install tabstack

From Source

git clone https://github.com/tabstack/tabs-python.git
cd tabs-python
pip install -e ".[dev]"

Quick Start

import asyncio
import os
from tabstack import TABStack

async def main():
    # Initialize the client with connection pooling
    async with TABStack(
        api_key=os.getenv('TABSTACK_API_KEY'),
        max_connections=100,
        max_keepalive_connections=20
    ) as tabs:
        # Extract markdown from a URL
        result = await tabs.extract.markdown(
            url="https://news.ycombinator.com",
            metadata=True
        )
        print(result.content)
        print(result.metadata.title)

        # Extract structured JSON data
        schema = {
            "type": "object",
            "properties": {
                "stories": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "title": {"type": "string"},
                            "points": {"type": "number"},
                            "author": {"type": "string"}
                        }
                    }
                }
            }
        }

        data = await tabs.extract.json(
            url="https://news.ycombinator.com",
            schema=schema
        )

        # Generate transformed content with AI
        summary_schema = {
            "type": "object",
            "properties": {
                "summaries": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "title": {"type": "string"},
                            "category": {"type": "string"},
                            "summary": {"type": "string"}
                        }
                    }
                }
            }
        }

        # First extract the markdown
        markdown_result = await tabs.extract.markdown(url="https://news.ycombinator.com")

        # Then transform it with AI
        summaries = await tabs.generate.json(
            markdown=markdown_result.content,
            schema=summary_schema,
            instructions="For each story, categorize it and write a one-sentence summary"
        )

        # Automate web tasks (streaming)
        async for event in tabs.automate.execute(
            task="Find the top 3 trending repositories and extract their details",
            url="https://github.com/trending"
        ):
            if event.type == "task:completed":
                print(f"Result: {event.data.final_answer}")
            elif event.type == "agent:extracted":
                print(f"Extracted: {event.data.extracted_data}")

# Run the async function
asyncio.run(main())

API Reference

All methods are async and should be awaited. The client supports async context manager for automatic connection cleanup.

Client Initialization

from tabstack import TABStack

async with TABStack(
    api_key="your-api-key",
    base_url="https://api.tabstack.ai/",  # optional
    max_connections=100,  # optional
    max_keepalive_connections=20,  # optional
    keepalive_expiry=30.0,  # optional, in seconds
    timeout=60.0  # optional, in seconds
) as tabs:
    # Your code here
    pass

Parameters:

api_key (str, required): Your TABStack API key
base_url (str, optional): API base URL. Default: https://api.tabstack.ai/
max_connections (int, optional): Maximum concurrent connections. Default: 100
max_keepalive_connections (int, optional): Maximum idle connections to keep alive. Default: 20
keepalive_expiry (float, optional): Seconds to keep idle connections alive. Default: 30.0
timeout (float, optional): Request timeout in seconds. Default: 60.0

Extract Operator

The Extract operator converts web content into structured formats without AI transformation.

`extract.markdown(url, metadata=False, nocache=False)`

Convert URL content to Markdown format.

Parameters:

url (str): URL to convert
metadata (bool): If True, return metadata as separate field. If False, embed as YAML frontmatter. Default: False
nocache (bool): Bypass cache and force fresh retrieval. Default: False

Returns: MarkdownResponse with url, content, and optional metadata fields

Example:

result = await tabs.extract.markdown(
    url="https://example.com",
    metadata=True
)
print(result.content)
print(result.metadata.title)

`extract.schema(url, instructions, nocache=False)`

Generate a JSON Schema by analyzing the structure of a webpage.

Parameters:

url (str): URL to analyze
instructions (str): Instructions for what data to extract (max 1000 characters)
nocache (bool): Bypass cache. Default: False

Returns: SchemaResponse with generated schema dict

Example:

result = await tabs.extract.schema(
    url="https://example.com/products",
    instructions="Extract product listings with name, price, and availability"
)
# Use the schema for extraction
data = await tabs.extract.json(url="https://example.com/products", schema=result.schema)

`extract.json(url, schema, nocache=False)`

Extract structured JSON data from a URL using a schema.

Parameters:

url (str): URL to extract from
schema (dict): JSON Schema defining the structure
nocache (bool): Bypass cache. Default: False

Returns: JsonResponse with extracted data

Example:

schema = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "price": {"type": "number"}
    }
}
result = await tabs.extract.json(url="https://example.com", schema=schema)
print(result.data)

Generate Operator

The Generate operator uses AI to transform and enhance web content.

`generate.json(markdown, instructions, schema)`

Transform markdown content into structured JSON using AI.

Parameters:

markdown (str): Markdown content to transform
instructions (str): AI instructions for transformation
schema (dict): JSON Schema for output structure

Returns: JsonResponse with generated data

Example:

# First extract markdown
md = await tabs.extract.markdown(url="https://news.ycombinator.com")

# Then transform with AI
schema = {
    "type": "object",
    "properties": {
        "summary": {"type": "string"},
        "topics": {"type": "array", "items": {"type": "string"}}
    }
}
result = await tabs.generate.json(
    markdown=md.content,
    instructions="Summarize the content and extract main topics",
    schema=schema
)

Automate Operator

The Automate operator executes complex web automation tasks using natural language.

`automate.execute(task, url=None, schema=None)`

Execute an AI-powered browser automation task (returns async iterator for Server-Sent Events).

Parameters:

task (str): Natural language description of the task
url (str, optional): Starting URL for the task
schema (dict, optional): JSON Schema for structured data extraction

Yields: AutomateEvent objects with type and data fields

Event Types:

start: Automation started
agent:navigating: Agent is navigating to a URL
agent:thinking: Agent is analyzing the page
agent:action: Agent performed an action (click, scroll, etc.)
agent:extracted: Agent extracted structured data
task:completed: Task finished successfully

Example:

schema = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "stars": {"type": "number"}
        }
    }
}

async for event in tabs.automate.execute(
    task="Find trending repositories and extract their names and star counts",
    url="https://github.com/trending",
    schema=schema
):
    if event.type == "agent:extracted":
        print(f"Extracted: {event.data.extracted_data}")
    elif event.type == "task:completed":
        print(f"Final answer: {event.data.final_answer}")

Working with JSON Schemas

TABStack uses standard JSON Schema for defining data structures. Here are common patterns:

Basic Object

schema = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "price": {"type": "number"},
        "in_stock": {"type": "boolean"}
    }
}

Array of Objects

schema = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "id": {"type": "number"},
            "name": {"type": "string"}
        }
    }
}

Nested Objects

schema = {
    "type": "object",
    "properties": {
        "product": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "details": {
                    "type": "object",
                    "properties": {
                        "weight": {"type": "number"},
                        "dimensions": {"type": "string"}
                    }
                }
            }
        }
    }
}

Array of Primitives

schema = {
    "type": "object",
    "properties": {
        "tags": {
            "type": "array",
            "items": {"type": "string"}
        }
    }
}

For more information on JSON Schema, see json-schema.org.

Error Handling

The SDK provides specific exception classes for different error scenarios:

Exception	Status Code	Description	Retryable
`BadRequestError`	400	Invalid request parameters	No
`UnauthorizedError`	401	Invalid or missing API key	No
`InvalidURLError`	422	URL is invalid or inaccessible	No
`ServerError`	500	Internal server error	Yes (with backoff)
`ServiceUnavailableError`	503	Service temporarily unavailable	Yes (after delay)
`APIError`	Other	Generic API error	Depends on status

Example Error Handling

import asyncio
from tabstack import TABStack
from tabstack.exceptions import (
    BadRequestError,
    UnauthorizedError,
    InvalidURLError,
    ServerError,
    ServiceUnavailableError,
)

async def main():
    async with TABStack(api_key="your-api-key") as tabs:
        try:
            result = await tabs.extract.markdown(url="https://example.com")
        except UnauthorizedError:
            print("Error: Invalid API key")
        except InvalidURLError as e:
            print(f"Error: URL is invalid or inaccessible - {e.message}")
        except BadRequestError as e:
            print(f"Error: Bad request - {e.message}")
        except ServerError as e:
            print(f"Server error (retryable): {e.message}")
            # Implement retry logic with exponential backoff
        except ServiceUnavailableError as e:
            print(f"Service unavailable (retryable): {e.message}")
            # Wait and retry

asyncio.run(main())

Development & Testing

Setup Development Environment

# Clone the repository
git clone https://github.com/tabstack/tabs-python.git
cd tabs-python

# Install with development dependencies
pip install -e ".[dev]"

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=tabstack --cov-report=html

# Run specific test file
pytest tests/test_extract.py

# Run with verbose output
pytest -v

Code Quality

# Format code with ruff
ruff format .

# Lint code
ruff check .

# Type checking
mypy tabstack/

Test Structure

tests/
├── conftest.py              # Shared pytest fixtures
├── test_client.py           # TABStack client tests
├── test_extract.py          # Extract operator tests
├── test_generate.py         # Generate operator tests
├── test_automate.py         # Automate operator tests
├── test_http_client.py      # HTTP client tests
├── test_types.py            # Response type tests
├── test_exceptions.py       # Exception tests
├── test_utils.py            # Utility function tests
└── test_integration.py      # End-to-end integration tests

All tests use mocked HTTP responses - no real API calls are made during testing.

Contributing

Contributions are welcome! Here's a quick checklist:

Fork the repository and create a feature branch
Write tests for new functionality
Ensure all tests pass (pytest)
Format code with ruff (ruff format .)
Ensure linting passes (ruff check .)
Update documentation as needed
Submit a pull request with clear description

Requirements

Python 3.10+ (tested on 3.10, 3.11, 3.12, 3.13, 3.14)
httpx >= 0.27.0

License

Apache License 2.0 - see LICENSE for details.

Support

Email: support@tabstack.ai
Discord: Join our community
Documentation: docs.tabstack.ai

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.6.1

May 5, 2026

2.6.0

Apr 24, 2026

2.5.0

Apr 23, 2026

2.4.0

Apr 10, 2026

2.3.0

Mar 12, 2026

2.2.0

Feb 12, 2026

2.1.0

Jan 30, 2026

2.0.0

Jan 16, 2026

1.0.6

Dec 10, 2025

1.0.5

Dec 10, 2025

1.0.4

Nov 20, 2025

1.0.3

Nov 12, 2025

This version

1.0.2

Nov 12, 2025

1.0.1

Nov 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabstack-1.0.2.tar.gz (43.7 kB view details)

Uploaded Nov 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tabstack-1.0.2-py3-none-any.whl (36.2 kB view details)

Uploaded Nov 12, 2025 Python 3

File details

Details for the file tabstack-1.0.2.tar.gz.

File metadata

Download URL: tabstack-1.0.2.tar.gz
Upload date: Nov 12, 2025
Size: 43.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tabstack-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`faab855d24792a2b2a18e3532cff837de185f28844ab1ae55d23db87417efe34`
MD5	`2e36805276ff2cb06dc81d84222270f3`
BLAKE2b-256	`ddfd45070f41e0e8920301c5db95422611146a5fe81a1e68b36f19016308ec60`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tabstack-1.0.2.tar.gz:

Publisher: publish.yml on Mozilla-Ocho/tabstack-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tabstack-1.0.2.tar.gz
- Subject digest: faab855d24792a2b2a18e3532cff837de185f28844ab1ae55d23db87417efe34
- Sigstore transparency entry: 698225134
- Sigstore integration time: Nov 12, 2025
Source repository:
- Permalink: Mozilla-Ocho/tabstack-python@fa5a54dd0c38fd66a1f320a326f81e87859ecd96
- Branch / Tag: refs/tags/v1.0.2
- Owner: https://github.com/Mozilla-Ocho
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fa5a54dd0c38fd66a1f320a326f81e87859ecd96
- Trigger Event: release

File details

Details for the file tabstack-1.0.2-py3-none-any.whl.

File metadata

Download URL: tabstack-1.0.2-py3-none-any.whl
Upload date: Nov 12, 2025
Size: 36.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tabstack-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a94b939142190eff6f8a5961ec51680491c89f641dba382cc7a42e3df9c27e43`
MD5	`02bd532bd4592311058cf29466649376`
BLAKE2b-256	`3032c123366d58c8541986992c739ce6f8269d45675d32649d6cf67947a13183`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tabstack-1.0.2-py3-none-any.whl:

Publisher: publish.yml on Mozilla-Ocho/tabstack-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tabstack-1.0.2-py3-none-any.whl
- Subject digest: a94b939142190eff6f8a5961ec51680491c89f641dba382cc7a42e3df9c27e43
- Sigstore transparency entry: 698225148
- Sigstore integration time: Nov 12, 2025
Source repository:
- Permalink: Mozilla-Ocho/tabstack-python@fa5a54dd0c38fd66a1f320a326f81e87859ecd96
- Branch / Tag: refs/tags/v1.0.2
- Owner: https://github.com/Mozilla-Ocho
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fa5a54dd0c38fd66a1f320a326f81e87859ecd96
- Trigger Event: release

tabstack 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TABStack AI Python SDK

Features

Installation

Using uv (recommended)

Using pip

Using poetry

Using pipenv

From Source

Quick Start

API Reference

Client Initialization

Extract Operator

extract.markdown(url, metadata=False, nocache=False)

extract.schema(url, instructions, nocache=False)

extract.json(url, schema, nocache=False)

Generate Operator

generate.json(markdown, instructions, schema)

Automate Operator

automate.execute(task, url=None, schema=None)

Working with JSON Schemas

Basic Object

Array of Objects

Nested Objects

Array of Primitives

Error Handling

Example Error Handling

Development & Testing

Setup Development Environment

Running Tests

Code Quality

Test Structure

Contributing

Requirements

License

Links

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`extract.markdown(url, metadata=False, nocache=False)`

`extract.schema(url, instructions, nocache=False)`

`extract.json(url, schema, nocache=False)`

`generate.json(markdown, instructions, schema)`

`automate.execute(task, url=None, schema=None)`