Natural Language to Visualization Engine
Project description
๐จ NL2Viz Pro โ Natural Language to Visualization Engine
A production-grade Python library that converts natural language queries and datasets into beautiful, insightful visualizations automatically.
๐ฏ Features
โ Auto-detect data formats (CSV, Excel, JSON, SQLite) โ Smart intent parsing from natural language queries โ Automatic chart type selection based on data schema โ Multiple visualization libraries (Matplotlib, Seaborn, Plotly) โ Fuzzy column matching with typo correction โ Auto insights generation (trends, correlations, outliers) โ Interactive charts (Plotly support) โ CLI & Python API โ Comprehensive error handling โ Production-ready codebase
๐ Supported Chart Types
- Bar Charts
- Line Charts
- Pie Charts
- Histograms
- Scatter Plots
- Box Plots
- Heatmaps
- Violin Plots
- Area Charts
- Bubble Charts
- Correlation Matrix
- Pairplot
๐ฆ Installation
Via pip (when published)
pip install nl2viz-pro
From source
git clone <repository>
cd nl2viz_pro
pip install -r requirements.txt
pip install -e .
๐ Quick Start
Python API
from nl2viz_pro import visualize
# Quick one-liner
visualize("show sales by region", "data.csv")
# Or with more control
result = visualize(
"compare profit by category",
"data.xlsx",
library='plotly',
show=True,
save='output.html'
)
# Access results
insights = result['insights']
figure = result['figure']
data = result['data']
Load Data Once, Query Multiple Times
from nl2viz_pro import NL2VizPro
viz = NL2VizPro()
viz.load_data("data.csv")
# Query multiple times
viz.visualize("show sales by region")
viz.visualize("profit trend over time")
viz.visualize("distribution of quantities")
Command Line
# Basic usage
nl2viz "show sales by region" data.csv
# With options
nl2viz "correlation heatmap" data.xlsx --library plotly --save chart.html
# Verbose mode
nl2viz "analyze profit" data.json --library seaborn --verbose
# Don't display (just save)
nl2viz "sales distribution" data.csv --save output.png --no-show
๐ API Reference
NL2VizPro Class
Main class for visualization pipeline.
Methods
load_data(data_source)
Load data from various sources.
# From CSV
viz.load_data("data.csv")
# From Excel
viz.load_data("data.xlsx")
# From JSON
viz.load_data("data.json")
# From DataFrame
import pandas as pd
df = pd.read_csv("data.csv")
viz.load_data(df)
# Returns self for chaining
viz.load_data("data.csv").quick_stats()
visualize(query, data_source, library, **kwargs)
Convert natural language query to visualization.
result = viz.visualize(
query="show sales by region",
data_source="data.csv", # Optional if already loaded
library='matplotlib', # 'matplotlib', 'seaborn', 'plotly'
show=True, # Display chart
save='output.png', # Save to file
show_insights=True # Print insights
)
Returns: Dictionary with:
figure: Visualization objectinsights: Generated insightsintent: Parsed query intentdata: Processed visualization dataprocessed_data: Full processed dataset
get_columns()
Get list of available columns.
columns = viz.get_columns()
print(columns) # ['Date', 'Region', 'Sales', ...]
suggest_column(query_col)
Get fuzzy match suggestion for a column name.
match = viz.suggest_column("revenu") # Returns "revenue" (typo corrected)
get_data_info()
Get detailed data information.
info = viz.get_data_info()
# {
# 'shape': (100, 5),
# 'columns': [...],
# 'dtypes': {...},
# 'missing_values': {...},
# 'missing_percentage': {...}
# }
quick_stats()
Print quick statistical summary.
viz.quick_stats()
set_style(style, palette, figure_size)
Configure visualization style.
viz.set_style(
style='darkgrid',
palette='husl',
figure_size=(14, 8)
)
visualize() Convenience Function
One-liner for quick visualization.
from nl2viz_pro import visualize
result = visualize(
"show sales by region",
"data.csv",
library='matplotlib'
)
๐ง Natural Language Query Examples
Chart Type Detection
# Automatically detects line chart (time-based)
visualize("show profit over time", "data.csv")
# Detects bar chart (categorical)
visualize("compare sales by region", "data.csv")
# Detects scatter (two numeric columns)
visualize("relationship between price and quantity", "data.csv")
# Detects distribution
visualize("analyze sales distribution", "data.csv")
# Detects correlation
visualize("correlation heatmap", "data.csv")
Column Matching
Fuzzy matching handles typos and synonyms:
# Typo correction
visualize("show revenu by region", "data.csv") # 'revenu' โ 'revenue'
# Synonym matching
visualize("show earnings by category", "data.csv") # 'earnings' โ 'profit'
# Partial matching
visualize("compare sell by region", "data.csv") # 'sell' โ 'sales'
Aggregations
visualize("total sales by region", "data.csv") # sum
visualize("average profit by month", "data.csv") # mean
visualize("count of orders by category", "data.csv") # count
visualize("max price by product", "data.csv") # max
๐ Data Format Support
CSV
visualize("show data", "data.csv")
# Supported options: encoding, delimiter, etc.
Excel
visualize("show data", "data.xlsx") # Reads first sheet
visualize("show data", "data.xls") # Also works
JSON
# Array of objects
visualize("show data", "data.json")
# Nested JSON (auto-flattened)
visualize("show data", "data.json")
SQLite
visualize("show data", "database.db") # First table
visualize("show data", "sqlite:///database.db?table=sales") # Specific table
Pandas DataFrame
import pandas as pd
df = pd.read_csv("data.csv")
visualize("show data", df)
๐ Architecture
nl2viz_pro/
โโโ core/
โ โโโ input_handler.py # Data loading
โ โโโ schema_analyzer.py # Data type detection
โ โโโ intent_engine.py # NLP query parsing
โ โโโ processor.py # Data processing
โ โโโ visualizer.py # Chart creation
โ โโโ insight_engine.py # Insight generation
โโโ utils/
โ โโโ fuzzy_match.py # Column matching
โ โโโ validator.py # Input validation
โโโ connectors/ # Data connectors
โโโ api.py # Main API
โโโ cli.py # CLI
โโโ example.py # Examples
โโโ setup.py # Installation
โโโ requirements.txt # Dependencies
๐ Processing Pipeline
- Input Handler โ Auto-detect and load data
- Schema Analyzer โ Detect column types (numeric, categorical, datetime)
- Intent Engine โ Parse query, extract visualization intent
- Data Processor โ Apply filters, aggregations, transformations
- Visualizer โ Create chart using selected library
- Insight Engine โ Generate insights (trends, correlations, outliers)
๐จ Visualization Libraries
Switch between visualization libraries:
# Matplotlib (default)
visualize("show data", "data.csv", library='matplotlib')
# Seaborn (statistical)
visualize("show data", "data.csv", library='seaborn')
# Plotly (interactive)
visualize("show data", "data.csv", library='plotly')
๐ก Auto Insights
Automatically generates insights:
result = visualize("show data", "data.csv")
# Access insights
insights = result['insights']
# {
# 'summary': {...}, # Basic stats
# 'trends': [...], # Trend analysis
# 'correlations': [...], # Correlations
# 'outliers': [...], # Outlier detection
# 'top_bottom': {...}, # Top/bottom values
# 'distributions': {...} # Distribution analysis
# }
๐ ๏ธ Error Handling
Comprehensive error handling:
from nl2viz_pro import visualize
try:
visualize("show data", "missing_file.csv")
except FileNotFoundError:
print("File not found")
try:
visualize("show invalid_column", "data.csv")
except ValueError:
print("Column not found - suggestions available")
# Get column suggestions
viz = NL2VizPro()
viz.load_data("data.csv")
suggest = viz.suggest_column("revenu") # Returns closest match
๐ Advanced Usage
Custom Processing
from nl2viz_pro.core import DataProcessor, Intent Engine
viz = NL2VizPro()
viz.load_data("data.csv")
# Custom intent
intent = {
'chart_type': 'bar',
'x_column': 'Region',
'y_column': 'Sales',
'aggregation': 'sum',
'filters': [{'column': 'Sales', 'operator': '>', 'value': '1000'}],
}
# Process data
df_processed = DataProcessor.process(viz.df, intent)
# Create visualization
from nl2viz_pro.core import Visualizer
fig = Visualizer.create(df_processed, 'bar', 'Region', 'Sales')
Visualizer.show(fig)
Multiple Queries on Same Data
viz = NL2VizPro()
viz.load_data("large_dataset.csv")
queries = [
"show sales by region",
"profit trend",
"distribution of quantities",
"correlation matrix",
]
for query in queries:
result = viz.visualize(query, show=False, save=f"chart_{queries.index(query)}.png")
Batch Processing
import os
from pathlib import Path
viz = NL2VizPro()
for csv_file in Path("data/").glob("*.csv"):
viz.load_data(csv_file)
result = viz.visualize(
"show summary",
save=f"output/{csv_file.stem}.png"
)
๐งช Testing
Run examples:
python example.py
This shows 7 complete examples covering:
- Basic usage
- Load and query multiple times
- DataFrame input
- Different libraries
- Advanced queries
- Schema inspection
- JSON data
๐ Requirements
- Python 3.8+
- pandas >= 1.3.0
- numpy >= 1.20.0
- matplotlib >= 3.5.0
- seaborn >= 0.11.0
- plotly >= 5.0.0
- openpyxl >= 3.0.0 (Excel support)
๐ Performance
- Handles datasets up to 1M rows
- Smart sampling for large datasets
- Efficient fuzzy matching
- Optimized for common queries
๐ฎ Future Improvements
- Gradio UI for web interface
- Export charts to multiple formats
- Dashboard mode with multiple visualizations
- Custom color schemes
- Caching for repeated queries
- Multi-file analysis
- SQL query generation
- Advanced filtering UI
- Recommendation engine
- Integration with APIs
๐ License
MIT License - See LICENSE file for details
๐ง Support
For issues, suggestions, or contributions:
- GitHub Issues: [GitHub Repo]
- Email: dev@nl2viz.com
๐ Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
Made with โค๏ธ for data visualization
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nl2viz_pro-1.0.0.tar.gz.
File metadata
- Download URL: nl2viz_pro-1.0.0.tar.gz
- Upload date:
- Size: 24.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43e24e9d6a76dc96e403f6448269b9a78d961e7f85a38cd8e92ebe125784ae2c
|
|
| MD5 |
c924754157b38591b5bf2a508bb04917
|
|
| BLAKE2b-256 |
92df750725ba61e0d29bad130c9a85fac9695e6f0b78f1121f3e82f3739f2e24
|
File details
Details for the file nl2viz_pro-1.0.0-py3-none-any.whl.
File metadata
- Download URL: nl2viz_pro-1.0.0-py3-none-any.whl
- Upload date:
- Size: 23.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7b2809feac92f2ffeecbb9dfea6ca296d11b330de3c5d7508df955f5d7f36c6
|
|
| MD5 |
10a5ab2b38d0cb49128745e591f6365a
|
|
| BLAKE2b-256 |
ecf12237e53482b706f0a16065e45af90d39ac4824d7f1898388aaec17b3895c
|