Outhad_Edge is a powerful Python library for user retention analysis. It provides a simple and intuitive interface for tracking user behavior, analyzing data, and gaining valuable insights into your users.
Project description
Outhad_Edge
Next-Generation Behavioral Analytics for User Journey Intelligence
Get Started ยท Documentation ยท Examples ยท Use Cases
Transform Raw Events Into Actionable Behavioral Insights
๐ฏ Powerful Features That Drive Insights
๐ง Event Data Management
- Automated Schema Verification: Built-in validation ensures your user_id, event, and timestamp columns are properly formatted
- Visual Workflow Designer: Drag-and-drop interface for building complex data transformation pipelines
- Smart Session Detection: Automatically segments user activity into meaningful interaction sessions
- Flexible Event Filtering: Powerful filtering and aggregation tools for precise data manipulation
โ๏ธ Transformation Workflow Engine
- DAG-Based Processing: Build sophisticated preprocessing chains using directed acyclic graph architecture
- Comprehensive Processor Library: 14+ pre-built operators including session segmentation, event categorization, user lifecycle tracking, and journey truncation
- Pipeline Persistence: Export and import workflow configurations for consistency across team projects
- Collaborative Analytics: Share standardized preprocessing templates across multiple analysts
๐ Behavioral Intelligence Suite
- Flow Network Analysis: Dynamic visualizations revealing user navigation patterns and transition probabilities
- Sequential Behavior Tracking: Step-by-step progression analysis showing conversion at each journey stage
- Retention Cohort Engine: Time-series tracking of user engagement and return behavior
- ML-Driven Segmentation: Unsupervised clustering algorithms for automatic user group discovery
- Conversion Path Optimization: Traditional and multi-step funnel analysis with drop-off diagnostics
- Experiment Validation Tools: Statistical hypothesis testing for A/B experiments and significance analysis
- Multi-Path Flow Diagrams: Sankey visualizations comparing parallel user journey streams
๐จ Visualization & Platform Integration
- Native Jupyter Support: First-class integration with Jupyter Notebook and JupyterLab environments
- Rich Interactive Components: Dynamic widgets enabling real-time data exploration
- Multi-Format Export: Generate outputs in various formats suitable for presentations and reporting
- Plotly-Powered Charts: Industry-standard interactive visualizations with professional aesthetics
The Problem We Solve
Traditional analytics tells you what users do. Outhad_Edge reveals why they do it.
| Traditional Analytics | Outhad_Edge |
|---|---|
| Conversion rate: 3.2% | Identifies 5 user segments with conversion rates from 1.1% to 12.4% |
| Users dropped at checkout | Maps 47 unique paths to purchase, surfaces friction points |
| 30-day retention: 18% | Cohort analysis reveals retention peaks at 7 days, suggests onboarding optimization |
| Funnel: 100 โ 45 โ 12 โ 3 | Transition graphs show alternative high-value paths outside your funnel |
Installation & Setup
Standard Installation
pip install outhad_edge
With AI Capabilities (Natural Language Queries)
pip install outhad_edge[ai]
# Configure API access
export OPENAI_API_KEY="sk-..."
# OR
export ANTHROPIC_API_KEY="sk-ant-..."
Development Environment
git clone https://github.com/Outhad-Lab/outhad_edge.git
cd outhad_edge-tools
poetry install --with dev,docs,ai
Live Examples
Example 1: Talk to Your Data (AI-Powered)
No code. No SQL. Just ask.
from outhad_edge import Eventstream
import pandas as pd
# Your event data
events = pd.read_csv('user_events.csv')
stream = Eventstream(events)
# Initialize AI interface
nlq = stream.nlq(model="gpt-4")
# Natural language queries
nlq.ask("What's driving our conversion rate drop in the mobile segment?")
# โ Answer: "Mobile users experience 3.2x higher cart abandonment.
# Top friction: payment method selection (avg 47s vs 12s desktop)"
nlq.ask("Compare retention across user acquisition channels")
# โ Auto-generates cohort analysis + visualization
nlq.ask("Find behavioral patterns that predict churn")
# โ Runs clustering + statistical analysis, returns actionable segments
How It Works: RAG-powered code generation โ Sandboxed execution โ Self-correction โ Semantic caching
Example 2: Session-Based Journey Analysis
import outhad_edge as oe
# Load clickstream data
df = pd.read_csv('web_analytics.csv') # user_id, event, timestamp
stream = oe.Eventstream(df)
# Define user sessions (30-minute timeout)
stream = stream.split_sessions(timeout=(30, 'm'))
# Filter to core conversion events
stream = stream.filter_events([
'homepage', 'search', 'product_view',
'add_to_cart', 'checkout', 'purchase'
])
# Visualize user flow
stream.transition_graph() # Interactive network diagram
Example 3: Advanced Preprocessing Pipeline
# Build reproducible preprocessing workflow
pipeline = oe.PreprocessingGraph(stream)
# Step 1: Split into sessions
pipeline.add_node(
processor=oe.data_processors_lib.SplitSessions,
timeout=(20, 'm'),
session_col='session_id'
)
# Step 2: Label new vs returning users
pipeline.add_node(
processor=oe.data_processors_lib.LabelNewUsers,
new_users_list=['first_visit', 'signup']
)
# Step 3: Group granular events
pipeline.add_node(
processor=oe.data_processors_lib.GroupEvents,
event_groups={
'engagement': ['like', 'share', 'comment'],
'commerce': ['add_to_cart', 'purchase', 'wishlist']
}
)
# Execute pipeline
processed = pipeline.combine()
# Share with team (save graph configuration)
pipeline.export('preprocessing_config.json')
Example 4: ML-Powered Behavioral Segmentation
# Extract behavioral features
clusters = stream.clusters()
# TF-IDF feature extraction from event sequences
features = clusters.extract_features(
method='tfidf',
ngram_range=(1, 3) # Single events + 2-3 event sequences
)
# K-means clustering
clusters.fit(method='kmeans', n_clusters=5, X=features)
# Analyze segments
segments = clusters.cluster_mapping
print(segments.groupby('cluster_id').agg({
'user_id': 'count',
'conversion': 'mean',
'ltv': 'mean'
}))
# Visualize
clusters.plot() # Interactive cluster visualization
What Makes Us Different
|
Traditional Product Analytics
|
Outhad_Edge
|
Core Capabilities
1. AI Query Engine (NEW)
Technology Stack: LangChain ยท ChromaDB ยท OpenAI/Anthropic ยท Redis
| Feature | Description | Benefit |
|---|---|---|
| Natural Language Interface | Ask questions in plain English | Non-technical users get insights instantly |
| RAG Architecture | Vector embeddings + semantic search | 95%+ query accuracy with domain context |
| Self-Correction | Automatic error fixing (3 retry limit) | Handles edge cases without manual debugging |
| Semantic Caching | Redis-backed similarity matching | 90%+ cache hit rate = 10x faster responses |
| Code Transparency | Shows generated Python code | Trust + learning for technical users |
Architecture:
User Query โ Semantic Retrieval (ChromaDB) โ LLM Code Gen (GPT-4/Claude)
โ Safety Validation โ Sandboxed Execution โ Result + Visualization
2. Behavioral Analysis Toolkit
| Tool | Purpose | Output |
|---|---|---|
| Transition Graph | User flow network analysis | Interactive D3.js graph with event transitions |
| Step Matrix | Sequential step-by-step analysis | Conversion rates between each event pair |
| Cohort Analysis | Time-based retention tracking | Heatmap showing retention by cohort |
| Funnel Analysis | Traditional conversion funnels | Stage-by-stage drop-off with statistics |
| Clustering | Behavioral segmentation (K-means, DBSCAN) | User segments with defining characteristics |
| Statistical Tests | A/B testing, Chi-square, T-tests | Significance testing for experiments |
| Sankey Diagrams | Multi-path flow visualization | Parallel path comparison |
3. Data Preprocessing Engine
14 Built-in Processors:
# Session management
SplitSessions # Time-based session splitting
CollapseLoops # Remove repetitive event cycles
# User lifecycle
LabelNewUsers # Identify user acquisition events
LabelLostUsers # Churn event detection
LabelCroppedPaths # Incomplete journey handling
# Event manipulation
FilterEvents # Include/exclude specific events
GroupEvents # Categorize events into groups
AddStartEndEvents # Synthetic boundary events
TruncatePaths # Limit path length
# Advanced
AddPositiveEvents # Inject success indicators
AddNegativeEvents # Inject failure indicators
DropPaths # Remove specific user journeys
Visual Pipeline Builder: Drag-and-drop GUI in Jupyter for non-coders
Technical Architecture
outhad_edge/
โ
โโ ai/ # AI Query System
โ โโ nlq_engine.py # Main NLQ orchestrator
โ โโ semantic_layer.py # Business glossary + schema metadata
โ โโ vector_store.py # ChromaDB embeddings manager
โ โโ llm_agent.py # LangChain LLM integration
โ โโ code_executor.py # Sandboxed Python execution
โ โโ code_validator.py # Security validation layer
โ โโ cache_manager.py # Redis semantic cache
โ
โโ eventstream/ # Core Data Structure
โ โโ eventstream.py # Main Eventstream class
โ โโ schema.py # RawDataSchema validation
โ โโ helpers.py # Utility functions
โ
โโ preprocessing_graph/ # Pipeline Engine
โ โโ preprocessing_graph.py # DAG-based workflow
โ โโ graph_widgets.py # Jupyter GUI components
โ
โโ data_processors_lib/ # Transformation Operators
โ โโ split_sessions.py
โ โโ filter_events.py
โ โโ [12 more processors...]
โ โโ base.py # Abstract processor class
โ
โโ tooling/ # Analysis Tools
โ โโ transition_graph/ # Network flow viz
โ โโ cohorts/ # Retention analysis
โ โโ funnel/ # Conversion funnels
โ โโ clusters/ # ML segmentation
โ โโ step_matrix/ # Sequential analysis
โ โโ stattests/ # Statistical testing
โ
โโ backend/ # Infrastructure
โ โโ tracker.py # Usage analytics
โ โโ server.py # Jupyter widget server
โ
โโ datasets/ # Sample Data
โโ data/
โโ simple-onlineshop.csv # Demo e-commerce data
Data Requirements
Input Schema:
| Column | Type | Required | Description |
|---|---|---|---|
user_id |
string/int | Yes | Unique user identifier |
event |
string | Yes | Event name (e.g., "page_view", "purchase") |
timestamp |
datetime | Yes | Event timestamp (any pandas-compatible format) |
* |
any | No | Additional custom columns |
Example:
import pandas as pd
data = pd.DataFrame({
'user_id': ['U001', 'U001', 'U002', 'U002', 'U001'],
'event': ['login', 'view_product', 'signup', 'view_product', 'purchase'],
'timestamp': ['2024-01-15 09:00:00', '2024-01-15 09:05:00',
'2024-01-15 09:02:00', '2024-01-15 09:08:00',
'2024-01-15 09:15:00'],
'device': ['mobile', 'mobile', 'desktop', 'desktop', 'mobile'], # optional
'revenue': [0, 0, 0, 0, 49.99] # optional
})
stream = oe.Eventstream(data)
Supported Data Sources:
- Google Analytics BigQuery exports
- Segment, Amplitude, Mixpanel exports
- Custom event tracking (Snowplow, RudderStack)
- Database event logs (PostgreSQL, MongoDB)
- Web server logs (Apache, Nginx)
Industry Applications
SaaS & B2B Software
Challenge: 60% of trial users never activate a key feature
nlq.ask("Which feature combinations predict trial-to-paid conversion?")
# โ "Users who complete profile setup + invite team member convert at 8.3x rate.
# Only 12% of trials do both. Suggest onboarding flow A/B test."
Use Cases:
- Product-led growth optimization
- Feature adoption tracking
- Onboarding funnel analysis
- Expansion revenue triggers
E-Commerce & Retail
Challenge: Cart abandonment without knowing which step fails
# Analyze checkout micro-steps
checkout_stream = stream.filter_events(lambda df:
df['event'].str.contains('checkout_')
)
checkout_stream.transition_graph(threshold=0.05)
# โ Reveals 23% drop at "payment_method_selection"
Use Cases:
- Cart abandonment analysis
- Product recommendation optimization
- Cross-sell/upsell pattern detection
- Customer journey mapping
Media & Content Platforms
Challenge: Understand binge behavior vs churn patterns
# Segment by engagement patterns
clusters = stream.clusters()
features = clusters.extract_features(method='tfidf', ngram_range=(1,4))
clusters.fit(method='kmeans', n_clusters=6, X=features)
# Label clusters
for cluster_id in range(6):
cluster_users = clusters.cluster_mapping[
clusters.cluster_mapping['cluster_id'] == cluster_id
]
print(f"Cluster {cluster_id}: {len(cluster_users)} users")
nlq.ask(f"Describe behavior patterns of cluster {cluster_id}")
Use Cases:
- Content consumption patterns
- Churn prediction
- Personalization strategies
- Engagement scoring
Financial Services
Challenge: Identify fraud patterns in transaction sequences
# Anomaly detection using sequence analysis
suspicious = stream.filter_events(lambda df:
df.groupby('user_id')['event'].transform('count') > 50 # High velocity
)
nlq.ask("Find unusual transaction sequences in the last 7 days")
# โ Flags accounts with rare event combinations
Use Cases:
- Fraud detection
- Customer lifecycle analysis
- Cross-product adoption
- Compliance monitoring
Performance & Scale
| Metric | Specification |
|---|---|
| Event Processing | 10M+ events in <30s (single machine) |
| Memory Efficiency | Lazy loading, chunked processing |
| Parallelization | Multi-core support via pandas/numpy |
| AI Query Latency | <5s average (with caching: <500ms) |
| Supported Python | 3.8, 3.9, 3.10, 3.11 |
| Dependencies | pandas, networkx, scikit-learn, plotly |
Development Workflow
Testing:
pytest tests/ # Full test suite
pytest tests/eventstream/ # Component tests
tox -e py38,py39,py310,py311 # Multi-version testing
Code Quality:
black outhad_edge/ tests/ --line-length=120
mypy outhad_edge/
pre-commit run --all-files
Build Documentation:
cd docs/
make html # Generates HTML docs
Why Teams Choose Outhad_Edge
For Data Scientists:
- Built on pandas/numpy/scikit-learn (familiar stack)
- Fully programmable, not a black box
- Export to any format (CSV, Parquet, SQL)
- Jupyter-native with interactive widgets
For Product Managers:
- Natural language queries (no SQL/Python required)
- Visual pipeline builder (drag-and-drop)
- Share insights as interactive reports
- Faster iteration vs BI tools
For Analysts:
- Pre-built behavioral analytics methods
- Reproducible workflows (save/load pipelines)
- Statistical rigor built-in
- Production-ready code
For Engineering:
- Comprehensive test coverage (>85%)
- Type hints throughout
- Well-documented codebase
- Apache 2.0 license
Comparison Matrix
| Feature | Outhad_Edge | Amplitude | Mixpanel | Google Analytics |
|---|---|---|---|---|
| AI Natural Language Queries | โ Built-in | โ No | โ No | โ No |
| Custom Behavioral Analysis | โ Unlimited | โ ๏ธ Limited | โ ๏ธ Limited | โ No |
| Open Source | โ Yes | โ No | โ No | โ No |
| Self-Hosted | โ Yes | โ Cloud only | โ Cloud only | โ Cloud only |
| Python Integration | โ Native | โ ๏ธ API only | โ ๏ธ API only | โ ๏ธ API only |
| ML Segmentation | โ scikit-learn | โ ๏ธ Basic | โ ๏ธ Basic | โ No |
| Visual Pipeline Builder | โ Jupyter GUI | โ No | โ No | โ No |
| Cost (1M events/mo) | Free | ~$2,000 | ~$1,500 | Free (limited) |
Sample Datasets
Quick Start with Built-in Data:
from outhad_edge.datasets import load_simple_shop
# Load e-commerce sample data
df = load_simple_shop(as_dataframe=True)
print(df.shape) # (6,332 events, 10,283 users)
stream = oe.Eventstream(df)
stream.describe() # Summary statistics
# Try AI queries
nlq = stream.nlq()
nlq.ask("What's the most common path to purchase?")
Public Datasets Compatible:
- Kaggle: E-Commerce Clickstream 2024 (285M events)
- UCI: Online Retail Dataset
- Coveo: Shoppers Intent Prediction
- TheLook: E-commerce Analytics (BigQuery)
Roadmap
Current Version: v3.3.0
In Development:
- โ AI-powered natural language queries (Completed)
- ๐ Real-time streaming integration (Bytewax)
- ๐ Session replay + heatmaps
- ๐ Cross-device identity resolution
- ๐ Predictive analytics (churn, LTV)
- ๐ A/B test orchestration
See: FEATURE_ROADMAP_2025.md for details
Contributing
We welcome contributions! See development setup above.
Priority Areas:
- New data processors
- Additional analysis tools
- Performance optimizations
- Documentation improvements
Community:
- GitHub Issues: Bug reports & feature requests
- Discussions: Q&A and ideas
- Pull Requests: Code contributions
License
Apache 2.0 - Free for commercial and private use
Built for teams who move fast and break things (but want to know exactly what broke)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file outhad_edge-0.1.0.tar.gz.
File metadata
- Download URL: outhad_edge-0.1.0.tar.gz
- Upload date:
- Size: 879.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.8 Darwin/25.0.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
502b9ec8022ad00cb4d24f79d3b1898619a218e0bf402d2043f64d35af57edde
|
|
| MD5 |
c8b2b315195c5401b3377ff012009c7a
|
|
| BLAKE2b-256 |
9bd798427d87eef92390d22e0d1349c1e15ebbf3e0f2f4ad54f920bb0fb83120
|
File details
Details for the file outhad_edge-0.1.0-py3-none-any.whl.
File metadata
- Download URL: outhad_edge-0.1.0-py3-none-any.whl
- Upload date:
- Size: 958.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.8 Darwin/25.0.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ddfc078c89290b3de2baf6455a3ec1ba63a99c6876daa8170ba73a4678433b55
|
|
| MD5 |
0ff2ddb9cff203c43d6cb292cffc3d49
|
|
| BLAKE2b-256 |
9ea9ef31ff51f0b0567cfe0b8ad2189d9c09839b961ed27686d5311b636790de
|