Skip to main content

Multi-dimensional extraction engine for AI conversations

Project description

Analyxa

Multi-dimensional extraction engine for AI conversations.

Analyxa takes opaque conversations between users and AI agents and decomposes them into N configurable dimensions — sentiment, intensity, topics, risk signals, intent, entities, and more — stored as 1,536-dimensional semantic vectors.

PyPI version License: Apache 2.0 Python 3.10+


What it does

Conversation → Analyxa → Structured JSON (N fields) + Semantic Vector (1536D)

One conversation in, structured intelligence out:

  • 10 universal fields extracted from any conversation (sentiment, topics, risk signals, intent, entities, action items...)
  • Vertical schemas add domain-specific fields: support (16), sales (16), coaching (18)
  • Semantic vectors enable similarity search across thousands of conversations
  • Pipeline ready: Redis queue → Analyxa → Qdrant vector DB

Quick Start

Installation

pip install analyxa

Python API

from analyxa import analyze

result = analyze(
    "User: I was charged twice for my subscription.\n"
    "Agent: I see the duplicate charge. Processing a refund now.\n"
    "User: Thanks, but please make sure it doesn't happen again.",
    schema="support"
)

print(result.fields["sentiment"])           # "negative"
print(result.fields["satisfaction_prediction"])  # "dissatisfied"
print(result.fields["issue_category"])      # "billing"
print(result.fields["risk_signals"])        # ["frustration", "repeat_contact"]

CLI

# Analyze a conversation file
analyxa analyze conversation.txt --schema support --output result.json

# List available schemas
analyxa schemas list

# Show schema fields
analyxa schemas show support

# Batch analyze a directory
analyxa batch ./conversations/ --schema universal --output-dir ./results/

Environment Setup

Create a .env file:

ANTHROPIC_API_KEY=sk-ant-...      # Required for analysis
OPENAI_API_KEY=sk-...              # Optional, for embeddings
ANALYXA_PROVIDER=anthropic         # or "openai"
ANALYXA_SCHEMA=universal           # Default schema

Schemas

Analyxa uses YAML schemas to define what to extract. Schemas are hierarchical — vertical schemas inherit all universal fields.

Schema Fields Description
universal 10 Base fields for any conversation
support 16 Customer support (+satisfaction, issue category, effort score...)
sales 16 Sales conversations (+buying stage, objections, budget signals...)
coaching 18 Coaching/therapeutic (+emotional valence, behavioral patterns, coping strategies...)

Universal Fields (included in all schemas)

Field Type Description
title string Descriptive session name
summary string 3-5 sentence summary (vectorized for search)
sentiment keyword User sentiment: positive, negative, mixed, neutral
sentiment_intensity keyword low, medium, high
topics keyword_array Specific topics discussed
session_outcome keyword resolved, unresolved, escalated, abandoned
user_intent string What the user really needed
risk_signals keyword_array frustration, churn_risk, complaint, urgency...
key_entities keyword_array People, products, dates, amounts mentioned
action_items string_array Explicit commitments or next steps

Custom Schemas

Create your own schema by inheriting from universal:

metadata:
  name: my_vertical
  version: "1.0"
  description: "Custom schema for my use case"
  inherits: universal

fields:
  - name: custom_field
    type: keyword
    required: true
    description: "My custom dimension"
    prompt_hint: "Instructions for the LLM on how to extract this field"
    allowed_values: [option_a, option_b, option_c]

Production Pipeline

Redis → Analyxa → Qdrant

# Start infrastructure
cd docker && docker compose up -d

# Push conversations to Redis queue
analyxa redis push conversation.txt --schema support

# Process all pending conversations
analyxa redis process

# Search by semantic similarity
analyxa search "frustrated customer with billing issue" --limit 5

Python Pipeline

from analyxa.sources.redis_source import RedisSource
from analyxa.sinks.qdrant_sink import QdrantSink
from analyxa.batch import batch_analyze_from_redis

# Process Redis queue → Qdrant
result = batch_analyze_from_redis()
print(f"Processed: {result.successful}/{result.total}")

# Search similar conversations
sink = QdrantSink()
similar = sink.search_similar(query_embedding, limit=10, filters={"sentiment": "negative"})

Configuration

All settings via environment variables or .env file:

Variable Default Description
ANTHROPIC_API_KEY Anthropic API key
OPENAI_API_KEY OpenAI API key (for embeddings)
ANALYXA_PROVIDER anthropic LLM provider: anthropic or openai
ANALYXA_MODEL (provider default) Model override
ANALYXA_SCHEMA universal Default schema
ANALYXA_EMBEDDINGS true Enable/disable embeddings
REDIS_URL redis://localhost:6379 Redis connection
QDRANT_URL http://localhost:6333 Qdrant connection

Architecture

src/analyxa/
├── analyzer.py          # Pipeline orchestrator
├── schema.py            # YAML schema loader with inheritance
├── prompt_builder.py    # Dynamic prompt generation from schemas
├── llm_client.py        # Multi-provider LLM abstraction
├── embeddings.py        # Semantic vector generation (1536D)
├── config.py            # Centralized configuration
├── cli.py               # Click CLI
├── batch.py             # Batch processing
├── sources/
│   ├── file_source.py   # Read from files
│   └── redis_source.py  # Read from Redis queue
├── sinks/
│   ├── json_sink.py     # Write to JSON files
│   ├── stdout_sink.py   # Print to terminal
│   └── qdrant_sink.py   # Store in Qdrant
└── schemas/
    ├── universal.yaml   # 10 base fields
    ├── support.yaml     # +6 support fields
    ├── sales.yaml       # +6 sales fields
    └── coaching.yaml    # +8 coaching fields

License

Apache 2.0 — see LICENSE for details.

Contributing

See CONTRIBUTING.md for guidelines.


Built by Next AI Ecosystem

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

analyxa-0.1.0.tar.gz (69.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

analyxa-0.1.0-py3-none-any.whl (38.0 kB view details)

Uploaded Python 3

File details

Details for the file analyxa-0.1.0.tar.gz.

File metadata

  • Download URL: analyxa-0.1.0.tar.gz
  • Upload date:
  • Size: 69.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for analyxa-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ba30fbbe3cb006c5d51e7b8e25e06c79838601b82ceebf0f3e0cbe654e527790
MD5 3fb8e1889c177efd3ee7e3fdb98e5224
BLAKE2b-256 efc7fbfb43701dc3ac54f0bd6e338d83dec97b37df78d5a6dd3197a7b19e3604

See more details on using hashes here.

File details

Details for the file analyxa-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: analyxa-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 38.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for analyxa-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dfc7342c58be238db76385e2f26628e84214f52754f299f5fea44d0e131c0c29
MD5 ed53c9fe7878d943c5441a3b646491e3
BLAKE2b-256 05c932b54d8ca0503742eee4ed4cb34769586d8a0911866af2acd055ba392a1b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page