Outhad_Edge is a powerful Python library for user retention analysis. It provides a simple and intuitive interface for tracking user behavior, analyzing data, and gaining valuable insights into your users.

These details have not been verified by PyPI

Project links

Project description

Outhad_Edge

Next-Generation Behavioral Analytics for User Journey Intelligence

Get Started · Documentation · Examples · Use Cases

Transform Raw Events Into Actionable Behavioral Insights

🎯 Powerful Features That Drive Insights

🔧 Event Data Management

Automated Schema Verification: Built-in validation ensures your user_id, event, and timestamp columns are properly formatted
Visual Workflow Designer: Drag-and-drop interface for building complex data transformation pipelines
Smart Session Detection: Automatically segments user activity into meaningful interaction sessions
Flexible Event Filtering: Powerful filtering and aggregation tools for precise data manipulation

⚙️ Transformation Workflow Engine

DAG-Based Processing: Build sophisticated preprocessing chains using directed acyclic graph architecture
Comprehensive Processor Library: 14+ pre-built operators including session segmentation, event categorization, user lifecycle tracking, and journey truncation
Pipeline Persistence: Export and import workflow configurations for consistency across team projects
Collaborative Analytics: Share standardized preprocessing templates across multiple analysts

📊 Behavioral Intelligence Suite

Flow Network Analysis: Dynamic visualizations revealing user navigation patterns and transition probabilities
Sequential Behavior Tracking: Step-by-step progression analysis showing conversion at each journey stage
Retention Cohort Engine: Time-series tracking of user engagement and return behavior
ML-Driven Segmentation: Unsupervised clustering algorithms for automatic user group discovery
Conversion Path Optimization: Traditional and multi-step funnel analysis with drop-off diagnostics
Experiment Validation Tools: Statistical hypothesis testing for A/B experiments and significance analysis
Multi-Path Flow Diagrams: Sankey visualizations comparing parallel user journey streams

🎨 Visualization & Platform Integration

Native Jupyter Support: First-class integration with Jupyter Notebook and JupyterLab environments
Rich Interactive Components: Dynamic widgets enabling real-time data exploration
Multi-Format Export: Generate outputs in various formats suitable for presentations and reporting
Plotly-Powered Charts: Industry-standard interactive visualizations with professional aesthetics

The Problem We Solve

Traditional analytics tells you what users do. Outhad_Edge reveals why they do it.

Traditional Analytics	Outhad_Edge
Conversion rate: 3.2%	Identifies 5 user segments with conversion rates from 1.1% to 12.4%
Users dropped at checkout	Maps 47 unique paths to purchase, surfaces friction points
30-day retention: 18%	Cohort analysis reveals retention peaks at 7 days, suggests onboarding optimization
Funnel: 100 → 45 → 12 → 3	Transition graphs show alternative high-value paths outside your funnel

Installation & Setup

Standard Installation

pip install outhad_edge

With AI Capabilities (Natural Language Queries)

pip install outhad_edge[ai]

# Configure API access
export OPENAI_API_KEY="sk-..."
# OR
export ANTHROPIC_API_KEY="sk-ant-..."

Development Environment

git clone https://github.com/Outhad-Lab/outhad_edge.git
cd outhad_edge-tools
poetry install --with dev,docs,ai

Live Examples

Example 1: Talk to Your Data (AI-Powered)

No code. No SQL. Just ask.

from outhad_edge import Eventstream
import pandas as pd

# Your event data
events = pd.read_csv('user_events.csv')
stream = Eventstream(events)

# Initialize AI interface
nlq = stream.nlq(model="gpt-4")

# Natural language queries
nlq.ask("What's driving our conversion rate drop in the mobile segment?")
# → Answer: "Mobile users experience 3.2x higher cart abandonment.
#    Top friction: payment method selection (avg 47s vs 12s desktop)"

nlq.ask("Compare retention across user acquisition channels")
# → Auto-generates cohort analysis + visualization

nlq.ask("Find behavioral patterns that predict churn")
# → Runs clustering + statistical analysis, returns actionable segments

How It Works: RAG-powered code generation → Sandboxed execution → Self-correction → Semantic caching

Example 2: Session-Based Journey Analysis

import outhad_edge as oe

# Load clickstream data
df = pd.read_csv('web_analytics.csv')  # user_id, event, timestamp
stream = oe.Eventstream(df)

# Define user sessions (30-minute timeout)
stream = stream.split_sessions(timeout=(30, 'm'))

# Filter to core conversion events
stream = stream.filter_events([
    'homepage', 'search', 'product_view',
    'add_to_cart', 'checkout', 'purchase'
])

# Visualize user flow
stream.transition_graph()  # Interactive network diagram

Example 3: Advanced Preprocessing Pipeline

# Build reproducible preprocessing workflow
pipeline = oe.PreprocessingGraph(stream)

# Step 1: Split into sessions
pipeline.add_node(
    processor=oe.data_processors_lib.SplitSessions,
    timeout=(20, 'm'),
    session_col='session_id'
)

# Step 2: Label new vs returning users
pipeline.add_node(
    processor=oe.data_processors_lib.LabelNewUsers,
    new_users_list=['first_visit', 'signup']
)

# Step 3: Group granular events
pipeline.add_node(
    processor=oe.data_processors_lib.GroupEvents,
    event_groups={
        'engagement': ['like', 'share', 'comment'],
        'commerce': ['add_to_cart', 'purchase', 'wishlist']
    }
)

# Execute pipeline
processed = pipeline.combine()

# Share with team (save graph configuration)
pipeline.export('preprocessing_config.json')

Example 4: ML-Powered Behavioral Segmentation

# Extract behavioral features
clusters = stream.clusters()

# TF-IDF feature extraction from event sequences
features = clusters.extract_features(
    method='tfidf',
    ngram_range=(1, 3)  # Single events + 2-3 event sequences
)

# K-means clustering
clusters.fit(method='kmeans', n_clusters=5, X=features)

# Analyze segments
segments = clusters.cluster_mapping
print(segments.groupby('cluster_id').agg({
    'user_id': 'count',
    'conversion': 'mean',
    'ltv': 'mean'
}))

# Visualize
clusters.plot()  # Interactive cluster visualization

What Makes Us Different

Traditional Product Analytics

Pre-built dashboards
Fixed metrics
Funnel-centric view
Report what happened
Requires analysts for insights
Static visualizations

Outhad_Edge

AI-driven exploration
Custom behavioral analysis
Journey-centric view
Explain why it happened
Natural language interface
Interactive, programmable viz

Core Capabilities

1. AI Query Engine (NEW)

Technology Stack: LangChain · ChromaDB · OpenAI/Anthropic · Redis

Feature	Description	Benefit
Natural Language Interface	Ask questions in plain English	Non-technical users get insights instantly
RAG Architecture	Vector embeddings + semantic search	95%+ query accuracy with domain context
Self-Correction	Automatic error fixing (3 retry limit)	Handles edge cases without manual debugging
Semantic Caching	Redis-backed similarity matching	90%+ cache hit rate = 10x faster responses
Code Transparency	Shows generated Python code	Trust + learning for technical users

Architecture:

User Query → Semantic Retrieval (ChromaDB) → LLM Code Gen (GPT-4/Claude)
           → Safety Validation → Sandboxed Execution → Result + Visualization

2. Behavioral Analysis Toolkit

Tool	Purpose	Output
Transition Graph	User flow network analysis	Interactive D3.js graph with event transitions
Step Matrix	Sequential step-by-step analysis	Conversion rates between each event pair
Cohort Analysis	Time-based retention tracking	Heatmap showing retention by cohort
Funnel Analysis	Traditional conversion funnels	Stage-by-stage drop-off with statistics
Clustering	Behavioral segmentation (K-means, DBSCAN)	User segments with defining characteristics
Statistical Tests	A/B testing, Chi-square, T-tests	Significance testing for experiments
Sankey Diagrams	Multi-path flow visualization	Parallel path comparison

3. Data Preprocessing Engine

14 Built-in Processors:

# Session management
SplitSessions          # Time-based session splitting
CollapseLoops          # Remove repetitive event cycles

# User lifecycle
LabelNewUsers          # Identify user acquisition events
LabelLostUsers         # Churn event detection
LabelCroppedPaths      # Incomplete journey handling

# Event manipulation
FilterEvents           # Include/exclude specific events
GroupEvents            # Categorize events into groups
AddStartEndEvents      # Synthetic boundary events
TruncatePaths          # Limit path length

# Advanced
AddPositiveEvents      # Inject success indicators
AddNegativeEvents      # Inject failure indicators
DropPaths              # Remove specific user journeys

Visual Pipeline Builder: Drag-and-drop GUI in Jupyter for non-coders

Technical Architecture

outhad_edge/
│
├─ ai/                          # AI Query System
│  ├─ nlq_engine.py             # Main NLQ orchestrator
│  ├─ semantic_layer.py         # Business glossary + schema metadata
│  ├─ vector_store.py           # ChromaDB embeddings manager
│  ├─ llm_agent.py              # LangChain LLM integration
│  ├─ code_executor.py          # Sandboxed Python execution
│  ├─ code_validator.py         # Security validation layer
│  └─ cache_manager.py          # Redis semantic cache
│
├─ eventstream/                 # Core Data Structure
│  ├─ eventstream.py            # Main Eventstream class
│  ├─ schema.py                 # RawDataSchema validation
│  └─ helpers.py                # Utility functions
│
├─ preprocessing_graph/         # Pipeline Engine
│  ├─ preprocessing_graph.py    # DAG-based workflow
│  └─ graph_widgets.py          # Jupyter GUI components
│
├─ data_processors_lib/         # Transformation Operators
│  ├─ split_sessions.py
│  ├─ filter_events.py
│  ├─ [12 more processors...]
│  └─ base.py                   # Abstract processor class
│
├─ tooling/                     # Analysis Tools
│  ├─ transition_graph/         # Network flow viz
│  ├─ cohorts/                  # Retention analysis
│  ├─ funnel/                   # Conversion funnels
│  ├─ clusters/                 # ML segmentation
│  ├─ step_matrix/              # Sequential analysis
│  └─ stattests/                # Statistical testing
│
├─ backend/                     # Infrastructure
│  ├─ tracker.py                # Usage analytics
│  └─ server.py                 # Jupyter widget server
│
└─ datasets/                    # Sample Data
   └─ data/
      └─ simple-onlineshop.csv  # Demo e-commerce data

Data Requirements

Input Schema:

Column	Type	Required	Description
`user_id`	string/int	Yes	Unique user identifier
`event`	string	Yes	Event name (e.g., "page_view", "purchase")
`timestamp`	datetime	Yes	Event timestamp (any pandas-compatible format)
`*`	any	No	Additional custom columns

Example:

import pandas as pd

data = pd.DataFrame({
    'user_id': ['U001', 'U001', 'U002', 'U002', 'U001'],
    'event': ['login', 'view_product', 'signup', 'view_product', 'purchase'],
    'timestamp': ['2024-01-15 09:00:00', '2024-01-15 09:05:00',
                  '2024-01-15 09:02:00', '2024-01-15 09:08:00',
                  '2024-01-15 09:15:00'],
    'device': ['mobile', 'mobile', 'desktop', 'desktop', 'mobile'],  # optional
    'revenue': [0, 0, 0, 0, 49.99]  # optional
})

stream = oe.Eventstream(data)

Supported Data Sources:

Google Analytics BigQuery exports
Segment, Amplitude, Mixpanel exports
Custom event tracking (Snowplow, RudderStack)
Database event logs (PostgreSQL, MongoDB)
Web server logs (Apache, Nginx)

Industry Applications

SaaS & B2B Software

Challenge: 60% of trial users never activate a key feature

nlq.ask("Which feature combinations predict trial-to-paid conversion?")
# → "Users who complete profile setup + invite team member convert at 8.3x rate.
#    Only 12% of trials do both. Suggest onboarding flow A/B test."

Use Cases:

Product-led growth optimization
Feature adoption tracking
Onboarding funnel analysis
Expansion revenue triggers

E-Commerce & Retail

Challenge: Cart abandonment without knowing which step fails

# Analyze checkout micro-steps
checkout_stream = stream.filter_events(lambda df:
    df['event'].str.contains('checkout_')
)
checkout_stream.transition_graph(threshold=0.05)
# → Reveals 23% drop at "payment_method_selection"

Use Cases:

Cart abandonment analysis
Product recommendation optimization
Cross-sell/upsell pattern detection
Customer journey mapping

Media & Content Platforms

Challenge: Understand binge behavior vs churn patterns

# Segment by engagement patterns
clusters = stream.clusters()
features = clusters.extract_features(method='tfidf', ngram_range=(1,4))
clusters.fit(method='kmeans', n_clusters=6, X=features)

# Label clusters
for cluster_id in range(6):
    cluster_users = clusters.cluster_mapping[
        clusters.cluster_mapping['cluster_id'] == cluster_id
    ]
    print(f"Cluster {cluster_id}: {len(cluster_users)} users")
    nlq.ask(f"Describe behavior patterns of cluster {cluster_id}")

Use Cases:

Content consumption patterns
Churn prediction
Personalization strategies
Engagement scoring

Financial Services

Challenge: Identify fraud patterns in transaction sequences

# Anomaly detection using sequence analysis
suspicious = stream.filter_events(lambda df:
    df.groupby('user_id')['event'].transform('count') > 50  # High velocity
)

nlq.ask("Find unusual transaction sequences in the last 7 days")
# → Flags accounts with rare event combinations

Use Cases:

Fraud detection
Customer lifecycle analysis
Cross-product adoption
Compliance monitoring

Performance & Scale

Metric	Specification
Event Processing	10M+ events in <30s (single machine)
Memory Efficiency	Lazy loading, chunked processing
Parallelization	Multi-core support via pandas/numpy
AI Query Latency	<5s average (with caching: <500ms)
Supported Python	3.8, 3.9, 3.10, 3.11
Dependencies	pandas, networkx, scikit-learn, plotly

Development Workflow

Testing:

pytest tests/                    # Full test suite
pytest tests/eventstream/        # Component tests
tox -e py38,py39,py310,py311    # Multi-version testing

Code Quality:

black outhad_edge/ tests/ --line-length=120
mypy outhad_edge/
pre-commit run --all-files

Build Documentation:

cd docs/
make html  # Generates HTML docs

Why Teams Choose Outhad_Edge

For Data Scientists:

Built on pandas/numpy/scikit-learn (familiar stack)
Fully programmable, not a black box
Export to any format (CSV, Parquet, SQL)
Jupyter-native with interactive widgets

For Product Managers:

Natural language queries (no SQL/Python required)
Visual pipeline builder (drag-and-drop)
Share insights as interactive reports
Faster iteration vs BI tools

For Analysts:

Pre-built behavioral analytics methods
Reproducible workflows (save/load pipelines)
Statistical rigor built-in
Production-ready code

For Engineering:

Comprehensive test coverage (>85%)
Type hints throughout
Well-documented codebase
Apache 2.0 license

Comparison Matrix

Feature	Outhad_Edge	Amplitude	Mixpanel	Google Analytics
AI Natural Language Queries	✅ Built-in	❌ No	❌ No	❌ No
Custom Behavioral Analysis	✅ Unlimited	⚠️ Limited	⚠️ Limited	❌ No
Open Source	✅ Yes	❌ No	❌ No	❌ No
Self-Hosted	✅ Yes	❌ Cloud only	❌ Cloud only	❌ Cloud only
Python Integration	✅ Native	⚠️ API only	⚠️ API only	⚠️ API only
ML Segmentation	✅ scikit-learn	⚠️ Basic	⚠️ Basic	❌ No
Visual Pipeline Builder	✅ Jupyter GUI	❌ No	❌ No	❌ No
Cost (1M events/mo)	Free	~$2,000	~$1,500	Free (limited)

Sample Datasets

Quick Start with Built-in Data:

from outhad_edge.datasets import load_simple_shop

# Load e-commerce sample data
df = load_simple_shop(as_dataframe=True)
print(df.shape)  # (6,332 events, 10,283 users)

stream = oe.Eventstream(df)
stream.describe()  # Summary statistics

# Try AI queries
nlq = stream.nlq()
nlq.ask("What's the most common path to purchase?")

Public Datasets Compatible:

Kaggle: E-Commerce Clickstream 2024 (285M events)
UCI: Online Retail Dataset
Coveo: Shoppers Intent Prediction
TheLook: E-commerce Analytics (BigQuery)

Roadmap

Current Version: v3.3.0

In Development:

✅ AI-powered natural language queries (Completed)
🔄 Real-time streaming integration (Bytewax)
🔄 Session replay + heatmaps
📋 Cross-device identity resolution
📋 Predictive analytics (churn, LTV)
📋 A/B test orchestration

See: FEATURE_ROADMAP_2025.md for details

Contributing

We welcome contributions! See development setup above.

Priority Areas:

New data processors
Additional analysis tools
Performance optimizations
Documentation improvements

Community:

GitHub Issues: Bug reports & feature requests
Discussions: Q&A and ideas
Pull Requests: Code contributions

License

Apache 2.0 - Free for commercial and private use

Built for teams who move fast and break things (but want to know exactly what broke)

Get Started Now · Read the Docs · See Examples

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Nov 1, 2025

0.1.0

Nov 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

outhad_edge-0.1.1.tar.gz (879.2 kB view details)

Uploaded Nov 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

outhad_edge-0.1.1-py3-none-any.whl (958.1 kB view details)

Uploaded Nov 1, 2025 Python 3

File details

Details for the file outhad_edge-0.1.1.tar.gz.

File metadata

Download URL: outhad_edge-0.1.1.tar.gz
Upload date: Nov 1, 2025
Size: 879.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.12.8 Darwin/25.0.0

File hashes

Hashes for outhad_edge-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`5540523e6dbb31517eb8d7b010f14881bdf38ca5db77954ef4b56243842f0821`
MD5	`42e6b17dfee947a4a690057f0ef1251b`
BLAKE2b-256	`1bdd3ee081d6f3f8603c1128c23c8c802e0c755715514eb5541b912639313caf`

See more details on using hashes here.

File details

Details for the file outhad_edge-0.1.1-py3-none-any.whl.

File metadata

Download URL: outhad_edge-0.1.1-py3-none-any.whl
Upload date: Nov 1, 2025
Size: 958.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.12.8 Darwin/25.0.0

File hashes

Hashes for outhad_edge-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6efdf9185c6373954b749d760c611502e9345e597bab22dc2ebe6403d3fbb2bf`
MD5	`18f92f586b40eb8975f085f907df5af1`
BLAKE2b-256	`60bd74494409d7d014058d2f61489f96f865d65e06a36f4fb6e1e253194e1f54`

See more details on using hashes here.

outhad_edge 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Outhad_Edge

Transform Raw Events Into Actionable Behavioral Insights

🎯 Powerful Features That Drive Insights

🔧 Event Data Management

⚙️ Transformation Workflow Engine

📊 Behavioral Intelligence Suite

🎨 Visualization & Platform Integration

The Problem We Solve

Installation & Setup

Live Examples

Example 1: Talk to Your Data (AI-Powered)

Example 2: Session-Based Journey Analysis

Example 3: Advanced Preprocessing Pipeline

Example 4: ML-Powered Behavioral Segmentation

What Makes Us Different

Core Capabilities

1. AI Query Engine (NEW)

2. Behavioral Analysis Toolkit

3. Data Preprocessing Engine

Technical Architecture

Data Requirements

Industry Applications

SaaS & B2B Software

E-Commerce & Retail

Media & Content Platforms

Financial Services

Performance & Scale

Development Workflow

Why Teams Choose Outhad_Edge

Comparison Matrix

Sample Datasets

Roadmap

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes