Skip to main content

A client library for BookWyrm

Project description

bookwyrm

A Python client library for interacting with BookWyrm instances, featuring both synchronous and asynchronous clients plus a rich command-line interface.

Installation

Using uv (recommended for development)

# Clone the repository
git clone https://github.com/yourusername/bookwyrm.git
cd bookwyrm

# Install dependencies and create virtual environment
uv sync

# Install in development mode
uv pip install -e .

Using pip

# Install from PyPI (when published)
pip install bookwyrm

Getting an API Key

To use the BookWyrm client, you'll need an API key from bookwyrm.ai:

  1. Visit bookwyrm.ai
  2. Click on "Sign up for beta" to create an account
  3. Once registered, you'll receive your API key
  4. Set your API key as an environment variable or pass it directly to the client
export BOOKWYRM_API_KEY="your-api-key-here"

Usage

Python Library

The BookWyrm client provides both synchronous and asynchronous interfaces for text processing, citation finding, summarization, and phrasal analysis.

Synchronous Client

from bookwyrm import BookWyrmClient, CitationRequest, TextChunk, ProcessTextRequest, ResponseFormat, ClassifyRequest, SummarizeRequest

# Initialize client
client = BookWyrmClient(base_url="https://api.bookwyrm.ai:443", api_key="your-key")

# Citation finding
chunks = [
    TextChunk(text="This is the first chunk.", start_char=0, end_char=25),
    TextChunk(text="This is the second chunk.", start_char=26, end_char=52),
]

request = CitationRequest(
    chunks=chunks,
    question="What are the chunks about?",
    max_tokens_per_chunk=1000
)

# Get citations (non-streaming)
response = client.get_citations(request)
print(f"Found {response.total_citations} citations")
for citation in response.citations:
    print(f"Quality: {citation.quality}/4")
    print(f"Text: {citation.text}")
    print(f"Reasoning: {citation.reasoning}")

# Stream citations (real-time results)
for stream_response in client.stream_citations(request):
    if hasattr(stream_response, 'citation'):
        print(f"New citation: {stream_response.citation.text}")
    elif hasattr(stream_response, 'message'):
        print(f"Progress: {stream_response.message}")

# Phrasal text processing
phrasal_request = ProcessTextRequest(
    text_url="https://www.gutenberg.org/cache/epub/32706/pg32706.txt",  # Triplanetary by E. E. Smith
    chunk_size=1000,
    response_format=ResponseFormat.WITH_OFFSETS
)

for response in client.process_text(phrasal_request):
    if hasattr(response, 'text'):
        print(f"Phrase: {response.text[:100]}...")
    elif hasattr(response, 'message'):
        print(f"Progress: {response.message}")

# File classification
classify_request = ClassifyRequest(
    url="https://www.gutenberg.org/ebooks/18857.epub3.images",
    filename="alice_wonderland.epub"  # Optional hint
)

classification_response = client.classify(classify_request)
print(f"Format: {classification_response.classification.format_type}")
print(f"Content Type: {classification_response.classification.content_type}")
print(f"MIME Type: {classification_response.classification.mime_type}")
print(f"Confidence: {classification_response.classification.confidence:.2%}")
print(f"File Size: {classification_response.file_size:,} bytes")

# Classify local content
with open("document.txt", "r") as f:
    content = f.read()

local_classify_request = ClassifyRequest(
    content=content,
    filename="document.txt"
)

local_response = client.classify(local_classify_request)
print(f"Local file classified as: {local_response.classification.content_type}")

# Classify binary content (automatically base64 encoded)
with open("image.jpg", "rb") as f:
    binary_content = f.read()
    import base64
    encoded_content = base64.b64encode(binary_content).decode("ascii")

binary_classify_request = ClassifyRequest(
    content=encoded_content,
    content_encoding="base64",
    filename="image.jpg"
)

binary_response = client.classify(binary_classify_request)
print(f"Binary file classified as: {binary_response.classification.content_type}")

client.close()

Asynchronous Client

import asyncio
from bookwyrm import AsyncBookWyrmClient, CitationRequest, ProcessTextRequest, ResponseFormat, ClassifyRequest, SummarizeRequest

async def main():
    # Initialize async client
    async with AsyncBookWyrmClient(base_url="https://api.bookwyrm.ai:443", api_key="your-key") as client:
        
        # Citation finding
        request = CitationRequest(
            jsonl_url="https://example.com/chunks.jsonl",
            question="What is the main topic?",
        )
        
        response = await client.get_citations(request)
        print(f"Found {response.total_citations} citations")
        
        # Stream citations
        async for stream_response in client.stream_citations(request):
            if hasattr(stream_response, 'citation'):
                print(f"New citation: {stream_response.citation.text}")

        # Phrasal text processing
        phrasal_request = ProcessTextRequest(
            text_url="https://www.gutenberg.org/cache/epub/32706/pg32706.txt",  # Triplanetary by E. E. Smith
            chunk_size=500,
            response_format=ResponseFormat.TEXT_ONLY
        )

        async for response in client.process_text(phrasal_request):
            if hasattr(response, 'text'):
                print(f"Phrase: {response.text[:100]}...")
            elif hasattr(response, 'message'):
                print(f"Progress: {response.message}")

        # File classification
        classify_request = ClassifyRequest(
            url="https://www.gutenberg.org/ebooks/18857.epub3.images"
        )
        
        classification = await client.classify(classify_request)
        print(f"Classified as: {classification.classification.content_type}")
        print(f"Confidence: {classification.classification.confidence:.2%}")

asyncio.run(main())

Command Line Interface

The CLI provides a rich, interactive interface for text processing operations:

Citation Finding

# Find citations in a JSONL file
bookwyrm cite "What is the main theme?" chunks.jsonl

# Save results to JSON
bookwyrm cite "What is the main theme?" chunks.jsonl --output results.json

# Use a URL as source
bookwyrm cite "What is the main theme?" --url https://example.com/chunks.jsonl

# Use --file option instead of positional argument
bookwyrm cite "What is the main theme?" --file chunks.jsonl

# Process only a subset of chunks
bookwyrm cite "What is the main theme?" chunks.jsonl --start 10 --limit 100

# Use non-streaming mode
bookwyrm cite "What is the main theme?" chunks.jsonl --no-stream

Phrasal Text Processing

# Process text from a URL (Triplanetary by E. E. Smith from Project Gutenberg)
bookwyrm phrasal --url "https://www.gutenberg.org/cache/epub/32706/pg32706.txt" --chunk-size 1000 --output triplanetary_phrases.jsonl

# Process text from a file
bookwyrm phrasal --file document.txt --format with_offsets --output phrases.jsonl

# Process text directly
bookwyrm phrasal "This is some text to analyze for phrases." --format text_only

# Use different SpaCy models
bookwyrm phrasal --file document.txt --spacy-model en_core_web_lg

File Classification

# Classify a URL resource (EPUB from Project Gutenberg)
bookwyrm classify --url "https://www.gutenberg.org/ebooks/18857.epub3.images" --output classification.json

# Classify a local file
bookwyrm classify --file document.pdf --output results.json

# Classify text content directly
bookwyrm classify "import pandas as pd\ndf = pd.DataFrame()" --filename "script.py"

# Classify with filename hint for better accuracy
bookwyrm classify --url "https://example.com/data" --filename "data.json"

# Note: Binary files are automatically detected and base64-encoded when using --file option

Summarization

# Summarize a JSONL file of phrases
bookwyrm summarize phrases.jsonl --output summary.json

# Include debug information
bookwyrm summarize phrases.jsonl --debug --max-tokens 5000

Global Options

All commands support these options:

# Set API key and base URL for individual commands
bookwyrm phrasal --api-key YOUR_KEY --base-url https://api.bookwyrm.ai:443 --url "https://example.com/text.txt"

# Enable verbose output (per command)
bookwyrm cite --verbose "Question?" chunks.jsonl

# Use environment variables (recommended)
export BOOKWYRM_API_URL="https://api.bookwyrm.ai:443"
export BOOKWYRM_API_KEY="your-api-key"
bookwyrm phrasal --url "https://example.com/text.txt"

Note: API key and base URL options are available on each command individually, not as global app-level options. Using environment variables is the recommended approach for setting these values across all commands.

Environment Variables

Set these environment variables for convenience:

export BOOKWYRM_API_KEY="your-api-key"
export BOOKWYRM_API_URL="https://api.bookwyrm.ai:443"

Development

This project supports both uv and pip for development:

# With uv
uv sync
uv run pytest
uv run bookwyrm --help

# With pip
pip install -r requirements-dev.txt
pytest
bookwyrm --help

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=bookwyrm

# Run async tests specifically
pytest -k "async"

API Reference

Models

  • TextChunk: Represents a text chunk with start/end character positions
  • CitationRequest: Request model for citation processing
  • Citation: A found citation with quality score and reasoning
  • CitationResponse: Response containing multiple citations
  • UsageInfo: Token usage and cost information
  • ClassifyRequest: Request model for file classification
  • ClassifyResponse: Response containing classification results
  • FileClassification: Detailed classification information

Clients

  • BookWyrmClient: Synchronous client with get_citations(), stream_citations(), classify(), and other methods
  • AsyncBookWyrmClient: Asynchronous client with async versions of the same methods

Exceptions

  • BookWyrmClientError: Base exception class
  • BookWyrmAPIError: API-specific errors with status codes

License

See LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bookwyrm-0.1.0.tar.gz (48.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bookwyrm-0.1.0-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file bookwyrm-0.1.0.tar.gz.

File metadata

  • Download URL: bookwyrm-0.1.0.tar.gz
  • Upload date:
  • Size: 48.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bookwyrm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0015b0e882ba44d865ccb52b92e19c7214f4c603b15ecc7a887ce5f756a34835
MD5 1e76c41b04feee67ea4614b600be5560
BLAKE2b-256 d5f99bc7659a5a4bfa04ab60d587ceac128723fb7843a33794d703b3290f8792

See more details on using hashes here.

Provenance

The following attestation bundles were made for bookwyrm-0.1.0.tar.gz:

Publisher: publish-to-pypi.yml on scidonia/bookwyrm-client

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bookwyrm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bookwyrm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bookwyrm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6938ad8da3b02a9088de1558486ad472c6d8013e9bd252e1529a2436539b3d80
MD5 5acaf5691b315b5773343ee90df09887
BLAKE2b-256 9795f11711f0afa4f1f9f10f0f40c1f167e2f087fe17581ff1f9aff901d2a17c

See more details on using hashes here.

Provenance

The following attestation bundles were made for bookwyrm-0.1.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on scidonia/bookwyrm-client

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page