Skip to main content

Outhad_Edge is a powerful Python library for user retention analysis. It provides a simple and intuitive interface for tracking user behavior, analyzing data, and gaining valuable insights into your users.

Project description

Outhad_Edge

Outhad_Edge

Next-Generation Behavioral Analytics for User Journey Intelligence

Get Started ยท Documentation ยท Examples ยท Use Cases


Transform Raw Events Into Actionable Behavioral Insights


๐ŸŽฏ Powerful Features That Drive Insights

๐Ÿ”ง Event Data Management

  • Automated Schema Verification: Built-in validation ensures your user_id, event, and timestamp columns are properly formatted
  • Visual Workflow Designer: Drag-and-drop interface for building complex data transformation pipelines
  • Smart Session Detection: Automatically segments user activity into meaningful interaction sessions
  • Flexible Event Filtering: Powerful filtering and aggregation tools for precise data manipulation

โš™๏ธ Transformation Workflow Engine

  • DAG-Based Processing: Build sophisticated preprocessing chains using directed acyclic graph architecture
  • Comprehensive Processor Library: 14+ pre-built operators including session segmentation, event categorization, user lifecycle tracking, and journey truncation
  • Pipeline Persistence: Export and import workflow configurations for consistency across team projects
  • Collaborative Analytics: Share standardized preprocessing templates across multiple analysts

๐Ÿ“Š Behavioral Intelligence Suite

  • Flow Network Analysis: Dynamic visualizations revealing user navigation patterns and transition probabilities
  • Sequential Behavior Tracking: Step-by-step progression analysis showing conversion at each journey stage
  • Retention Cohort Engine: Time-series tracking of user engagement and return behavior
  • ML-Driven Segmentation: Unsupervised clustering algorithms for automatic user group discovery
  • Conversion Path Optimization: Traditional and multi-step funnel analysis with drop-off diagnostics
  • Experiment Validation Tools: Statistical hypothesis testing for A/B experiments and significance analysis
  • Multi-Path Flow Diagrams: Sankey visualizations comparing parallel user journey streams

๐ŸŽจ Visualization & Platform Integration

  • Native Jupyter Support: First-class integration with Jupyter Notebook and JupyterLab environments
  • Rich Interactive Components: Dynamic widgets enabling real-time data exploration
  • Multi-Format Export: Generate outputs in various formats suitable for presentations and reporting
  • Plotly-Powered Charts: Industry-standard interactive visualizations with professional aesthetics

The Problem We Solve

Traditional analytics tells you what users do. Outhad_Edge reveals why they do it.

Traditional Analytics Outhad_Edge
Conversion rate: 3.2% Identifies 5 user segments with conversion rates from 1.1% to 12.4%
Users dropped at checkout Maps 47 unique paths to purchase, surfaces friction points
30-day retention: 18% Cohort analysis reveals retention peaks at 7 days, suggests onboarding optimization
Funnel: 100 โ†’ 45 โ†’ 12 โ†’ 3 Transition graphs show alternative high-value paths outside your funnel

Installation & Setup

Standard Installation

pip install outhad_edge

With AI Capabilities (Natural Language Queries)

pip install outhad_edge[ai]

# Configure API access
export OPENAI_API_KEY="sk-..."
# OR
export ANTHROPIC_API_KEY="sk-ant-..."

Development Environment

git clone https://github.com/Outhad-Lab/outhad_edge.git
cd outhad_edge-tools
poetry install --with dev,docs,ai

Live Examples

Example 1: Talk to Your Data (AI-Powered)

No code. No SQL. Just ask.

from outhad_edge import Eventstream
import pandas as pd

# Your event data
events = pd.read_csv('user_events.csv')
stream = Eventstream(events)

# Initialize AI interface
nlq = stream.nlq(model="gpt-4")

# Natural language queries
nlq.ask("What's driving our conversion rate drop in the mobile segment?")
# โ†’ Answer: "Mobile users experience 3.2x higher cart abandonment.
#    Top friction: payment method selection (avg 47s vs 12s desktop)"

nlq.ask("Compare retention across user acquisition channels")
# โ†’ Auto-generates cohort analysis + visualization

nlq.ask("Find behavioral patterns that predict churn")
# โ†’ Runs clustering + statistical analysis, returns actionable segments

How It Works: RAG-powered code generation โ†’ Sandboxed execution โ†’ Self-correction โ†’ Semantic caching


Example 2: Session-Based Journey Analysis

import outhad_edge as oe

# Load clickstream data
df = pd.read_csv('web_analytics.csv')  # user_id, event, timestamp
stream = oe.Eventstream(df)

# Define user sessions (30-minute timeout)
stream = stream.split_sessions(timeout=(30, 'm'))

# Filter to core conversion events
stream = stream.filter_events([
    'homepage', 'search', 'product_view',
    'add_to_cart', 'checkout', 'purchase'
])

# Visualize user flow
stream.transition_graph()  # Interactive network diagram

Example 3: Advanced Preprocessing Pipeline

# Build reproducible preprocessing workflow
pipeline = oe.PreprocessingGraph(stream)

# Step 1: Split into sessions
pipeline.add_node(
    processor=oe.data_processors_lib.SplitSessions,
    timeout=(20, 'm'),
    session_col='session_id'
)

# Step 2: Label new vs returning users
pipeline.add_node(
    processor=oe.data_processors_lib.LabelNewUsers,
    new_users_list=['first_visit', 'signup']
)

# Step 3: Group granular events
pipeline.add_node(
    processor=oe.data_processors_lib.GroupEvents,
    event_groups={
        'engagement': ['like', 'share', 'comment'],
        'commerce': ['add_to_cart', 'purchase', 'wishlist']
    }
)

# Execute pipeline
processed = pipeline.combine()

# Share with team (save graph configuration)
pipeline.export('preprocessing_config.json')

Example 4: ML-Powered Behavioral Segmentation

# Extract behavioral features
clusters = stream.clusters()

# TF-IDF feature extraction from event sequences
features = clusters.extract_features(
    method='tfidf',
    ngram_range=(1, 3)  # Single events + 2-3 event sequences
)

# K-means clustering
clusters.fit(method='kmeans', n_clusters=5, X=features)

# Analyze segments
segments = clusters.cluster_mapping
print(segments.groupby('cluster_id').agg({
    'user_id': 'count',
    'conversion': 'mean',
    'ltv': 'mean'
}))

# Visualize
clusters.plot()  # Interactive cluster visualization

What Makes Us Different

Traditional Product Analytics

  • Pre-built dashboards
  • Fixed metrics
  • Funnel-centric view
  • Report what happened
  • Requires analysts for insights
  • Static visualizations

Outhad_Edge

  • AI-driven exploration
  • Custom behavioral analysis
  • Journey-centric view
  • Explain why it happened
  • Natural language interface
  • Interactive, programmable viz

Core Capabilities

1. AI Query Engine (NEW)

Technology Stack: LangChain ยท ChromaDB ยท OpenAI/Anthropic ยท Redis

Feature Description Benefit
Natural Language Interface Ask questions in plain English Non-technical users get insights instantly
RAG Architecture Vector embeddings + semantic search 95%+ query accuracy with domain context
Self-Correction Automatic error fixing (3 retry limit) Handles edge cases without manual debugging
Semantic Caching Redis-backed similarity matching 90%+ cache hit rate = 10x faster responses
Code Transparency Shows generated Python code Trust + learning for technical users

Architecture:

User Query โ†’ Semantic Retrieval (ChromaDB) โ†’ LLM Code Gen (GPT-4/Claude)
           โ†’ Safety Validation โ†’ Sandboxed Execution โ†’ Result + Visualization

2. Behavioral Analysis Toolkit

Tool Purpose Output
Transition Graph User flow network analysis Interactive D3.js graph with event transitions
Step Matrix Sequential step-by-step analysis Conversion rates between each event pair
Cohort Analysis Time-based retention tracking Heatmap showing retention by cohort
Funnel Analysis Traditional conversion funnels Stage-by-stage drop-off with statistics
Clustering Behavioral segmentation (K-means, DBSCAN) User segments with defining characteristics
Statistical Tests A/B testing, Chi-square, T-tests Significance testing for experiments
Sankey Diagrams Multi-path flow visualization Parallel path comparison

3. Data Preprocessing Engine

14 Built-in Processors:

# Session management
SplitSessions          # Time-based session splitting
CollapseLoops          # Remove repetitive event cycles

# User lifecycle
LabelNewUsers          # Identify user acquisition events
LabelLostUsers         # Churn event detection
LabelCroppedPaths      # Incomplete journey handling

# Event manipulation
FilterEvents           # Include/exclude specific events
GroupEvents            # Categorize events into groups
AddStartEndEvents      # Synthetic boundary events
TruncatePaths          # Limit path length

# Advanced
AddPositiveEvents      # Inject success indicators
AddNegativeEvents      # Inject failure indicators
DropPaths              # Remove specific user journeys

Visual Pipeline Builder: Drag-and-drop GUI in Jupyter for non-coders


Technical Architecture

outhad_edge/
โ”‚
โ”œโ”€ ai/                          # AI Query System
โ”‚  โ”œโ”€ nlq_engine.py             # Main NLQ orchestrator
โ”‚  โ”œโ”€ semantic_layer.py         # Business glossary + schema metadata
โ”‚  โ”œโ”€ vector_store.py           # ChromaDB embeddings manager
โ”‚  โ”œโ”€ llm_agent.py              # LangChain LLM integration
โ”‚  โ”œโ”€ code_executor.py          # Sandboxed Python execution
โ”‚  โ”œโ”€ code_validator.py         # Security validation layer
โ”‚  โ””โ”€ cache_manager.py          # Redis semantic cache
โ”‚
โ”œโ”€ eventstream/                 # Core Data Structure
โ”‚  โ”œโ”€ eventstream.py            # Main Eventstream class
โ”‚  โ”œโ”€ schema.py                 # RawDataSchema validation
โ”‚  โ””โ”€ helpers.py                # Utility functions
โ”‚
โ”œโ”€ preprocessing_graph/         # Pipeline Engine
โ”‚  โ”œโ”€ preprocessing_graph.py    # DAG-based workflow
โ”‚  โ””โ”€ graph_widgets.py          # Jupyter GUI components
โ”‚
โ”œโ”€ data_processors_lib/         # Transformation Operators
โ”‚  โ”œโ”€ split_sessions.py
โ”‚  โ”œโ”€ filter_events.py
โ”‚  โ”œโ”€ [12 more processors...]
โ”‚  โ””โ”€ base.py                   # Abstract processor class
โ”‚
โ”œโ”€ tooling/                     # Analysis Tools
โ”‚  โ”œโ”€ transition_graph/         # Network flow viz
โ”‚  โ”œโ”€ cohorts/                  # Retention analysis
โ”‚  โ”œโ”€ funnel/                   # Conversion funnels
โ”‚  โ”œโ”€ clusters/                 # ML segmentation
โ”‚  โ”œโ”€ step_matrix/              # Sequential analysis
โ”‚  โ””โ”€ stattests/                # Statistical testing
โ”‚
โ”œโ”€ backend/                     # Infrastructure
โ”‚  โ”œโ”€ tracker.py                # Usage analytics
โ”‚  โ””โ”€ server.py                 # Jupyter widget server
โ”‚
โ””โ”€ datasets/                    # Sample Data
   โ””โ”€ data/
      โ””โ”€ simple-onlineshop.csv  # Demo e-commerce data

Data Requirements

Input Schema:

Column Type Required Description
user_id string/int Yes Unique user identifier
event string Yes Event name (e.g., "page_view", "purchase")
timestamp datetime Yes Event timestamp (any pandas-compatible format)
* any No Additional custom columns

Example:

import pandas as pd

data = pd.DataFrame({
    'user_id': ['U001', 'U001', 'U002', 'U002', 'U001'],
    'event': ['login', 'view_product', 'signup', 'view_product', 'purchase'],
    'timestamp': ['2024-01-15 09:00:00', '2024-01-15 09:05:00',
                  '2024-01-15 09:02:00', '2024-01-15 09:08:00',
                  '2024-01-15 09:15:00'],
    'device': ['mobile', 'mobile', 'desktop', 'desktop', 'mobile'],  # optional
    'revenue': [0, 0, 0, 0, 49.99]  # optional
})

stream = oe.Eventstream(data)

Supported Data Sources:

  • Google Analytics BigQuery exports
  • Segment, Amplitude, Mixpanel exports
  • Custom event tracking (Snowplow, RudderStack)
  • Database event logs (PostgreSQL, MongoDB)
  • Web server logs (Apache, Nginx)

Industry Applications

SaaS & B2B Software

Challenge: 60% of trial users never activate a key feature

nlq.ask("Which feature combinations predict trial-to-paid conversion?")
# โ†’ "Users who complete profile setup + invite team member convert at 8.3x rate.
#    Only 12% of trials do both. Suggest onboarding flow A/B test."

Use Cases:

  • Product-led growth optimization
  • Feature adoption tracking
  • Onboarding funnel analysis
  • Expansion revenue triggers

E-Commerce & Retail

Challenge: Cart abandonment without knowing which step fails

# Analyze checkout micro-steps
checkout_stream = stream.filter_events(lambda df:
    df['event'].str.contains('checkout_')
)
checkout_stream.transition_graph(threshold=0.05)
# โ†’ Reveals 23% drop at "payment_method_selection"

Use Cases:

  • Cart abandonment analysis
  • Product recommendation optimization
  • Cross-sell/upsell pattern detection
  • Customer journey mapping

Media & Content Platforms

Challenge: Understand binge behavior vs churn patterns

# Segment by engagement patterns
clusters = stream.clusters()
features = clusters.extract_features(method='tfidf', ngram_range=(1,4))
clusters.fit(method='kmeans', n_clusters=6, X=features)

# Label clusters
for cluster_id in range(6):
    cluster_users = clusters.cluster_mapping[
        clusters.cluster_mapping['cluster_id'] == cluster_id
    ]
    print(f"Cluster {cluster_id}: {len(cluster_users)} users")
    nlq.ask(f"Describe behavior patterns of cluster {cluster_id}")

Use Cases:

  • Content consumption patterns
  • Churn prediction
  • Personalization strategies
  • Engagement scoring

Financial Services

Challenge: Identify fraud patterns in transaction sequences

# Anomaly detection using sequence analysis
suspicious = stream.filter_events(lambda df:
    df.groupby('user_id')['event'].transform('count') > 50  # High velocity
)

nlq.ask("Find unusual transaction sequences in the last 7 days")
# โ†’ Flags accounts with rare event combinations

Use Cases:

  • Fraud detection
  • Customer lifecycle analysis
  • Cross-product adoption
  • Compliance monitoring

Performance & Scale

Metric Specification
Event Processing 10M+ events in <30s (single machine)
Memory Efficiency Lazy loading, chunked processing
Parallelization Multi-core support via pandas/numpy
AI Query Latency <5s average (with caching: <500ms)
Supported Python 3.8, 3.9, 3.10, 3.11
Dependencies pandas, networkx, scikit-learn, plotly

Development Workflow

Testing:

pytest tests/                    # Full test suite
pytest tests/eventstream/        # Component tests
tox -e py38,py39,py310,py311    # Multi-version testing

Code Quality:

black outhad_edge/ tests/ --line-length=120
mypy outhad_edge/
pre-commit run --all-files

Build Documentation:

cd docs/
make html  # Generates HTML docs

Why Teams Choose Outhad_Edge

For Data Scientists:

  • Built on pandas/numpy/scikit-learn (familiar stack)
  • Fully programmable, not a black box
  • Export to any format (CSV, Parquet, SQL)
  • Jupyter-native with interactive widgets

For Product Managers:

  • Natural language queries (no SQL/Python required)
  • Visual pipeline builder (drag-and-drop)
  • Share insights as interactive reports
  • Faster iteration vs BI tools

For Analysts:

  • Pre-built behavioral analytics methods
  • Reproducible workflows (save/load pipelines)
  • Statistical rigor built-in
  • Production-ready code

For Engineering:

  • Comprehensive test coverage (>85%)
  • Type hints throughout
  • Well-documented codebase
  • Apache 2.0 license

Comparison Matrix

Feature Outhad_Edge Amplitude Mixpanel Google Analytics
AI Natural Language Queries โœ… Built-in โŒ No โŒ No โŒ No
Custom Behavioral Analysis โœ… Unlimited โš ๏ธ Limited โš ๏ธ Limited โŒ No
Open Source โœ… Yes โŒ No โŒ No โŒ No
Self-Hosted โœ… Yes โŒ Cloud only โŒ Cloud only โŒ Cloud only
Python Integration โœ… Native โš ๏ธ API only โš ๏ธ API only โš ๏ธ API only
ML Segmentation โœ… scikit-learn โš ๏ธ Basic โš ๏ธ Basic โŒ No
Visual Pipeline Builder โœ… Jupyter GUI โŒ No โŒ No โŒ No
Cost (1M events/mo) Free ~$2,000 ~$1,500 Free (limited)

Sample Datasets

Quick Start with Built-in Data:

from outhad_edge.datasets import load_simple_shop

# Load e-commerce sample data
df = load_simple_shop(as_dataframe=True)
print(df.shape)  # (6,332 events, 10,283 users)

stream = oe.Eventstream(df)
stream.describe()  # Summary statistics

# Try AI queries
nlq = stream.nlq()
nlq.ask("What's the most common path to purchase?")

Public Datasets Compatible:

  • Kaggle: E-Commerce Clickstream 2024 (285M events)
  • UCI: Online Retail Dataset
  • Coveo: Shoppers Intent Prediction
  • TheLook: E-commerce Analytics (BigQuery)

Roadmap

Current Version: v3.3.0

In Development:

  • โœ… AI-powered natural language queries (Completed)
  • ๐Ÿ”„ Real-time streaming integration (Bytewax)
  • ๐Ÿ”„ Session replay + heatmaps
  • ๐Ÿ“‹ Cross-device identity resolution
  • ๐Ÿ“‹ Predictive analytics (churn, LTV)
  • ๐Ÿ“‹ A/B test orchestration

See: FEATURE_ROADMAP_2025.md for details


Contributing

We welcome contributions! See development setup above.

Priority Areas:

  • New data processors
  • Additional analysis tools
  • Performance optimizations
  • Documentation improvements

Community:

  • GitHub Issues: Bug reports & feature requests
  • Discussions: Q&A and ideas
  • Pull Requests: Code contributions

License

Apache 2.0 - Free for commercial and private use


Built for teams who move fast and break things (but want to know exactly what broke)

Get Started Now ยท Read the Docs ยท See Examples

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

outhad_edge-0.1.1.tar.gz (879.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

outhad_edge-0.1.1-py3-none-any.whl (958.1 kB view details)

Uploaded Python 3

File details

Details for the file outhad_edge-0.1.1.tar.gz.

File metadata

  • Download URL: outhad_edge-0.1.1.tar.gz
  • Upload date:
  • Size: 879.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.8 Darwin/25.0.0

File hashes

Hashes for outhad_edge-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5540523e6dbb31517eb8d7b010f14881bdf38ca5db77954ef4b56243842f0821
MD5 42e6b17dfee947a4a690057f0ef1251b
BLAKE2b-256 1bdd3ee081d6f3f8603c1128c23c8c802e0c755715514eb5541b912639313caf

See more details on using hashes here.

File details

Details for the file outhad_edge-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: outhad_edge-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 958.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.8 Darwin/25.0.0

File hashes

Hashes for outhad_edge-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6efdf9185c6373954b749d760c611502e9345e597bab22dc2ebe6403d3fbb2bf
MD5 18f92f586b40eb8975f085f907df5af1
BLAKE2b-256 60bd74494409d7d014058d2f61489f96f865d65e06a36f4fb6e1e253194e1f54

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page