Automatic Storytelling from Data - Turn raw data into compelling business narratives
Project description
📊 DataStory - Automatic Storytelling from Data
Turn raw data into compelling business narratives automatically.
DataStory analyzes your datasets and generates full written reports with insights, trends, and recommendations - no LLMs needed, pure Python intelligence.
🚀 The Problem
- Dashboards don't explain insights - They show graphs, not stories
- People want narratives - Business stakeholders need context, not just charts
- Manual analysis takes time - Writing reports is tedious and repetitive
- Insights get lost - Important patterns buried in spreadsheets
💡 The Solution
DataStory automatically:
- ✅ Analyzes your data for trends, patterns, and anomalies
- ✅ Generates natural language business narratives
- ✅ Identifies risks and opportunities
- ✅ Provides actionable recommendations
- ✅ Exports to text, markdown, HTML, or PDF
All with a single line of code!
📦 Installation
pip install datastory
For full features (charts, Excel, PDF):
pip install datastory[full]
🎯 Quick Start
One-Line Magic
from datastory import narrate
report = narrate("sales.csv")
print(report)
Output:
📊 EXECUTIVE SUMMARY
==================================================
Analyzed 1,247 records across 8 dimensions.
🟡 3 high-priority insights identified.
Key Highlights:
1. Sales increased by 12.3% from $450,000 to $505,000.
2. Customer churn rose in April by 8.5%, requiring attention.
3. West Africa region dominates sales, accounting for 45.2% of revenue.
📈 KEY FINDINGS
==================================================
**Performance Trends:**
• Sales Shows Strong Growth: Sales increased by 12.3% from $450,000 to $505,000.
• Revenue per Customer Rising: Average order value grew by 15.7%.
**Notable Anomalies:**
• Unusual Values Detected in Order Quantity: Found 23 outliers (1.8% of data).
**Relationships Discovered:**
• Strong Positive Link: Marketing Spend and Revenue move together (correlation: 0.85).
🔍 DETAILED ANALYSIS
==================================================
**High-Priority Insights:**
🟡 Customer Churn Rising
Customer churn increased by 8.5% in April. This represents a significant concern.
🟡 Low Stock Risk: Product X
Minimum inventory is 12 units, significantly below average of 150. Consider restocking.
💡 RECOMMENDATIONS
==================================================
1. Investigate the decline in customer retention and implement recovery strategies
2. Capitalize on the growth in revenue per customer to maximize returns
3. Replenish product_x inventory to avoid stockouts
4. Review outliers in order quantity to identify root causes
5. Leverage identified relationships between metrics for predictive insights
==================================================
Report generated on December 03, 2025 at 1:20 PM
Powered by DataStory - Automatic Storytelling from Data
🔥 Key Features
1. Pure Python Intelligence
- No LLMs or AI APIs required
- Works offline
- Fast and deterministic
- Zero-cost analysis
2. Comprehensive Analysis
- Statistical summaries
- Trend detection
- Anomaly identification
- Correlation discovery
- Time series patterns
- Risk assessment
3. Natural Language Output
- Business-friendly narratives
- Context-aware descriptions
- Action-oriented recommendations
- Multiple detail levels
4. Flexible Export
from datastory import DataStory
story = DataStory()
story.load("data.csv")
# Export to different formats
story.export("report.txt", format="text")
story.export("report.md", format="markdown")
story.export("report.html", format="html", include_charts=True)
story.export("report.pdf", format="pdf")
5. Multiple Data Sources
# CSV, Excel, JSON, Parquet
story.load("sales.csv")
story.load("data.xlsx")
story.load("records.json")
story.load("dataset.parquet")
# URLs
story.load("https://example.com/data.csv")
# Pandas DataFrames
import pandas as pd
df = pd.read_sql("SELECT * FROM sales", conn)
story.load(df)
📖 Advanced Usage
Customization
from datastory import DataStory
# Configure narrative style
config = {
"style": "business", # business, casual, technical
"detail_level": "detailed", # brief, medium, detailed
"include_recommendations": True
}
story = DataStory(config=config)
story.load("sales.csv")
narrative = story.generate_narrative()
print(narrative)
Programmatic Access
# Access insights directly
story = DataStory()
story.load("data.csv")
insights = story.extract_insights()
for insight in insights:
print(f"{insight.type}: {insight.title}")
print(f"Priority: {insight.priority}")
print(f"Description: {insight.description}\n")
Analysis Results
# Get raw analysis results
story = DataStory()
story.load("data.csv")
results = story.analyze()
print(results["trends"])
print(results["anomalies"])
print(results["correlations"])
🎓 Use Cases
1. Business Intelligence
Generate executive summaries from sales, marketing, or financial data.
2. Data Science Reports
Automatically document exploratory data analysis (EDA) findings.
3. Automated Monitoring
Create daily/weekly reports on KPIs and metrics.
4. Client Reporting
Transform raw analytics into client-ready narratives.
5. Academic Research
Quickly summarize dataset characteristics and patterns.
🆚 Why DataStory?
| Feature | DataStory | Traditional BI | LLM-based |
|---|---|---|---|
| Setup Time | Instant | Hours/Days | API setup |
| Cost | Free | $$$$ | $$$ per call |
| Offline Use | ✅ Yes | ❌ No | ❌ No |
| Customizable | ✅ Full control | ⚠️ Limited | ❌ Black box |
| Speed | ⚡ Instant | 🐌 Slow | ⏳ API delays |
| Privacy | 🔒 Local | ⚠️ Cloud | ❌ Sent to API |
| Deterministic | ✅ Yes | ✅ Yes | ❌ No |
📊 Example Datasets
The examples/ directory includes sample datasets:
sales.csv- Sales performance datacustomer_churn.csv- Customer retention datainventory.csv- Stock levels and products
🛠️ Technical Details
Architecture
- Core Analyzer: Statistical analysis using pandas/numpy
- Insight Extractor: Pattern recognition and business logic
- Narrative Generator: Template-based natural language generation
- Data Loaders: Multi-format support (CSV, Excel, JSON, Parquet)
- Report Formatters: Export to text, markdown, HTML, PDF
Dependencies
- Core:
pandas,numpy - Optional:
matplotlib(charts),openpyxl(Excel),reportlab(PDF)
Performance
- Analyzes 100K rows in <2 seconds
- Generates narrative in <1 second
- Low memory footprint
🤝 Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Submit a pull request
📝 License
MIT License - see LICENSE file for details.
🙏 Acknowledgments
Built with ❤️ by Idriss Bado
Inspired by the need for better data communication in business.
📧 Contact
- GitHub: @idrissbado
- PyPI: datastory
⭐ Star this repo if you find it useful!
🐛 Found a bug? Open an issue
💡 Have an idea? Start a discussion
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datastory_ai-0.1.0.tar.gz.
File metadata
- Download URL: datastory_ai-0.1.0.tar.gz
- Upload date:
- Size: 31.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77f35f3f3afefb1d7e4fd121f3534e075e9f052c5d1dc28831a9f769e069f120
|
|
| MD5 |
954cd7765396684e009e331137953da0
|
|
| BLAKE2b-256 |
7f17b2b044b3a1c046d9b62e2695335662e7941705a995687eef1db788a7a6ac
|
File details
Details for the file datastory_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: datastory_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fed442561797891b198ce85c2719188d8afeaa8f3f9abf3def6982e012066cfa
|
|
| MD5 |
830424b136e189fce3a4adf22d26b1af
|
|
| BLAKE2b-256 |
2ad52e1282ce47791ddb0e3308d7963a982d8ecb6aa7a5d8b27dcd478e6ac619
|