Skip to main content

Automatic Storytelling from Data - Turn raw data into compelling business narratives

Project description

📊 DataStory - Automatic Storytelling from Data

PyPI version Python 3.8+ License: MIT

Turn raw data into compelling business narratives automatically.

DataStory analyzes your datasets and generates full written reports with insights, trends, and recommendations - no LLMs needed, pure Python intelligence.

🚀 The Problem

  • Dashboards don't explain insights - They show graphs, not stories
  • People want narratives - Business stakeholders need context, not just charts
  • Manual analysis takes time - Writing reports is tedious and repetitive
  • Insights get lost - Important patterns buried in spreadsheets

💡 The Solution

DataStory automatically:

  • ✅ Analyzes your data for trends, patterns, and anomalies
  • ✅ Generates natural language business narratives
  • ✅ Identifies risks and opportunities
  • ✅ Provides actionable recommendations
  • ✅ Exports to text, markdown, HTML, or PDF

All with a single line of code!

📦 Installation

pip install datastory

For full features (charts, Excel, PDF):

pip install datastory[full]

🎯 Quick Start

One-Line Magic

from datastory import narrate

report = narrate("sales.csv")
print(report)

Output:

📊 EXECUTIVE SUMMARY
==================================================
Analyzed 1,247 records across 8 dimensions.

🟡 3 high-priority insights identified.

Key Highlights:
1. Sales increased by 12.3% from $450,000 to $505,000.
2. Customer churn rose in April by 8.5%, requiring attention.
3. West Africa region dominates sales, accounting for 45.2% of revenue.

📈 KEY FINDINGS
==================================================

**Performance Trends:**
• Sales Shows Strong Growth: Sales increased by 12.3% from $450,000 to $505,000.
• Revenue per Customer Rising: Average order value grew by 15.7%.

**Notable Anomalies:**
• Unusual Values Detected in Order Quantity: Found 23 outliers (1.8% of data).

**Relationships Discovered:**
• Strong Positive Link: Marketing Spend and Revenue move together (correlation: 0.85).

🔍 DETAILED ANALYSIS
==================================================

**High-Priority Insights:**

🟡 Customer Churn Rising
   Customer churn increased by 8.5% in April. This represents a significant concern.

🟡 Low Stock Risk: Product X
   Minimum inventory is 12 units, significantly below average of 150. Consider restocking.

💡 RECOMMENDATIONS
==================================================
1. Investigate the decline in customer retention and implement recovery strategies
2. Capitalize on the growth in revenue per customer to maximize returns
3. Replenish product_x inventory to avoid stockouts
4. Review outliers in order quantity to identify root causes
5. Leverage identified relationships between metrics for predictive insights

==================================================
Report generated on December 03, 2025 at 1:20 PM
Powered by DataStory - Automatic Storytelling from Data

🔥 Key Features

1. Pure Python Intelligence

  • No LLMs or AI APIs required
  • Works offline
  • Fast and deterministic
  • Zero-cost analysis

2. Comprehensive Analysis

  • Statistical summaries
  • Trend detection
  • Anomaly identification
  • Correlation discovery
  • Time series patterns
  • Risk assessment

3. Natural Language Output

  • Business-friendly narratives
  • Context-aware descriptions
  • Action-oriented recommendations
  • Multiple detail levels

4. Flexible Export

from datastory import DataStory

story = DataStory()
story.load("data.csv")

# Export to different formats
story.export("report.txt", format="text")
story.export("report.md", format="markdown")
story.export("report.html", format="html", include_charts=True)
story.export("report.pdf", format="pdf")

5. Multiple Data Sources

# CSV, Excel, JSON, Parquet
story.load("sales.csv")
story.load("data.xlsx")
story.load("records.json")
story.load("dataset.parquet")

# URLs
story.load("https://example.com/data.csv")

# Pandas DataFrames
import pandas as pd
df = pd.read_sql("SELECT * FROM sales", conn)
story.load(df)

📖 Advanced Usage

Customization

from datastory import DataStory

# Configure narrative style
config = {
    "style": "business",  # business, casual, technical
    "detail_level": "detailed",  # brief, medium, detailed
    "include_recommendations": True
}

story = DataStory(config=config)
story.load("sales.csv")
narrative = story.generate_narrative()
print(narrative)

Programmatic Access

# Access insights directly
story = DataStory()
story.load("data.csv")

insights = story.extract_insights()
for insight in insights:
    print(f"{insight.type}: {insight.title}")
    print(f"Priority: {insight.priority}")
    print(f"Description: {insight.description}\n")

Analysis Results

# Get raw analysis results
story = DataStory()
story.load("data.csv")

results = story.analyze()
print(results["trends"])
print(results["anomalies"])
print(results["correlations"])

🎓 Use Cases

1. Business Intelligence

Generate executive summaries from sales, marketing, or financial data.

2. Data Science Reports

Automatically document exploratory data analysis (EDA) findings.

3. Automated Monitoring

Create daily/weekly reports on KPIs and metrics.

4. Client Reporting

Transform raw analytics into client-ready narratives.

5. Academic Research

Quickly summarize dataset characteristics and patterns.

🆚 Why DataStory?

Feature DataStory Traditional BI LLM-based
Setup Time Instant Hours/Days API setup
Cost Free $$$$ $$$ per call
Offline Use ✅ Yes ❌ No ❌ No
Customizable ✅ Full control ⚠️ Limited ❌ Black box
Speed ⚡ Instant 🐌 Slow ⏳ API delays
Privacy 🔒 Local ⚠️ Cloud ❌ Sent to API
Deterministic ✅ Yes ✅ Yes ❌ No

📊 Example Datasets

The examples/ directory includes sample datasets:

  • sales.csv - Sales performance data
  • customer_churn.csv - Customer retention data
  • inventory.csv - Stock levels and products

🛠️ Technical Details

Architecture

  • Core Analyzer: Statistical analysis using pandas/numpy
  • Insight Extractor: Pattern recognition and business logic
  • Narrative Generator: Template-based natural language generation
  • Data Loaders: Multi-format support (CSV, Excel, JSON, Parquet)
  • Report Formatters: Export to text, markdown, HTML, PDF

Dependencies

  • Core: pandas, numpy
  • Optional: matplotlib (charts), openpyxl (Excel), reportlab (PDF)

Performance

  • Analyzes 100K rows in <2 seconds
  • Generates narrative in <1 second
  • Low memory footprint

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

📝 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Built with ❤️ by Idriss Bado

Inspired by the need for better data communication in business.

📧 Contact


Star this repo if you find it useful!

🐛 Found a bug? Open an issue

💡 Have an idea? Start a discussion

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datastory_ai-0.1.0.tar.gz (31.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datastory_ai-0.1.0-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file datastory_ai-0.1.0.tar.gz.

File metadata

  • Download URL: datastory_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 31.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for datastory_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 77f35f3f3afefb1d7e4fd121f3534e075e9f052c5d1dc28831a9f769e069f120
MD5 954cd7765396684e009e331137953da0
BLAKE2b-256 7f17b2b044b3a1c046d9b62e2695335662e7941705a995687eef1db788a7a6ac

See more details on using hashes here.

File details

Details for the file datastory_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: datastory_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for datastory_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fed442561797891b198ce85c2719188d8afeaa8f3f9abf3def6982e012066cfa
MD5 830424b136e189fce3a4adf22d26b1af
BLAKE2b-256 2ad52e1282ce47791ddb0e3308d7963a982d8ecb6aa7a5d8b27dcd478e6ac619

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page