Skip to main content

Natural Language to Visualization Engine

Project description

๐ŸŽจ NL2Viz Pro โ€” Natural Language to Visualization Engine

A production-grade Python library that converts natural language queries and datasets into beautiful, insightful visualizations automatically.

๐ŸŽฏ Features

โœ… Auto-detect data formats (CSV, Excel, JSON, SQLite) โœ… Smart intent parsing from natural language queries โœ… Automatic chart type selection based on data schema โœ… Multiple visualization libraries (Matplotlib, Seaborn, Plotly) โœ… Fuzzy column matching with typo correction โœ… Auto insights generation (trends, correlations, outliers) โœ… Interactive charts (Plotly support) โœ… CLI & Python API โœ… Comprehensive error handling โœ… Production-ready codebase

๐Ÿ“Š Supported Chart Types

  • Bar Charts
  • Line Charts
  • Pie Charts
  • Histograms
  • Scatter Plots
  • Box Plots
  • Heatmaps
  • Violin Plots
  • Area Charts
  • Bubble Charts
  • Correlation Matrix
  • Pairplot

๐Ÿ“ฆ Installation

Via pip (when published)

pip install nl2viz-pro

From source

git clone <repository>
cd nl2viz_pro
pip install -r requirements.txt
pip install -e .

๐Ÿš€ Quick Start

Python API

from nl2viz_pro import visualize

# Quick one-liner
visualize("show sales by region", "data.csv")

# Or with more control
result = visualize(
    "compare profit by category",
    "data.xlsx",
    library='plotly',
    show=True,
    save='output.html'
)

# Access results
insights = result['insights']
figure = result['figure']
data = result['data']

Load Data Once, Query Multiple Times

from nl2viz_pro import NL2VizPro

viz = NL2VizPro()
viz.load_data("data.csv")

# Query multiple times
viz.visualize("show sales by region")
viz.visualize("profit trend over time")
viz.visualize("distribution of quantities")

Command Line

# Basic usage
nl2viz "show sales by region" data.csv

# With options
nl2viz "correlation heatmap" data.xlsx --library plotly --save chart.html

# Verbose mode
nl2viz "analyze profit" data.json --library seaborn --verbose

# Don't display (just save)
nl2viz "sales distribution" data.csv --save output.png --no-show

๐Ÿ“š API Reference

NL2VizPro Class

Main class for visualization pipeline.

Methods

load_data(data_source)

Load data from various sources.

# From CSV
viz.load_data("data.csv")

# From Excel
viz.load_data("data.xlsx")

# From JSON
viz.load_data("data.json")

# From DataFrame
import pandas as pd
df = pd.read_csv("data.csv")
viz.load_data(df)

# Returns self for chaining
viz.load_data("data.csv").quick_stats()
visualize(query, data_source, library, **kwargs)

Convert natural language query to visualization.

result = viz.visualize(
    query="show sales by region",
    data_source="data.csv",  # Optional if already loaded
    library='matplotlib',     # 'matplotlib', 'seaborn', 'plotly'
    show=True,               # Display chart
    save='output.png',       # Save to file
    show_insights=True       # Print insights
)

Returns: Dictionary with:

  • figure: Visualization object
  • insights: Generated insights
  • intent: Parsed query intent
  • data: Processed visualization data
  • processed_data: Full processed dataset
get_columns()

Get list of available columns.

columns = viz.get_columns()
print(columns)  # ['Date', 'Region', 'Sales', ...]
suggest_column(query_col)

Get fuzzy match suggestion for a column name.

match = viz.suggest_column("revenu")  # Returns "revenue" (typo corrected)
get_data_info()

Get detailed data information.

info = viz.get_data_info()
# {
#   'shape': (100, 5),
#   'columns': [...],
#   'dtypes': {...},
#   'missing_values': {...},
#   'missing_percentage': {...}
# }
quick_stats()

Print quick statistical summary.

viz.quick_stats()
set_style(style, palette, figure_size)

Configure visualization style.

viz.set_style(
    style='darkgrid',
    palette='husl',
    figure_size=(14, 8)
)

visualize() Convenience Function

One-liner for quick visualization.

from nl2viz_pro import visualize

result = visualize(
    "show sales by region",
    "data.csv",
    library='matplotlib'
)

๐Ÿง  Natural Language Query Examples

Chart Type Detection

# Automatically detects line chart (time-based)
visualize("show profit over time", "data.csv")

# Detects bar chart (categorical)
visualize("compare sales by region", "data.csv")

# Detects scatter (two numeric columns)
visualize("relationship between price and quantity", "data.csv")

# Detects distribution
visualize("analyze sales distribution", "data.csv")

# Detects correlation
visualize("correlation heatmap", "data.csv")

Column Matching

Fuzzy matching handles typos and synonyms:

# Typo correction
visualize("show revenu by region", "data.csv")  # 'revenu' โ†’ 'revenue'

# Synonym matching
visualize("show earnings by category", "data.csv")  # 'earnings' โ†’ 'profit'

# Partial matching
visualize("compare sell by region", "data.csv")  # 'sell' โ†’ 'sales'

Aggregations

visualize("total sales by region", "data.csv")      # sum
visualize("average profit by month", "data.csv")    # mean
visualize("count of orders by category", "data.csv") # count
visualize("max price by product", "data.csv")        # max

๐Ÿ” Data Format Support

CSV

visualize("show data", "data.csv")

# Supported options: encoding, delimiter, etc.

Excel

visualize("show data", "data.xlsx")  # Reads first sheet
visualize("show data", "data.xls")   # Also works

JSON

# Array of objects
visualize("show data", "data.json")

# Nested JSON (auto-flattened)
visualize("show data", "data.json")

SQLite

visualize("show data", "database.db")           # First table
visualize("show data", "sqlite:///database.db?table=sales")  # Specific table

Pandas DataFrame

import pandas as pd

df = pd.read_csv("data.csv")
visualize("show data", df)

๐Ÿ“Š Architecture

nl2viz_pro/
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ input_handler.py      # Data loading
โ”‚   โ”œโ”€โ”€ schema_analyzer.py    # Data type detection
โ”‚   โ”œโ”€โ”€ intent_engine.py      # NLP query parsing
โ”‚   โ”œโ”€โ”€ processor.py          # Data processing
โ”‚   โ”œโ”€โ”€ visualizer.py         # Chart creation
โ”‚   โ””โ”€โ”€ insight_engine.py     # Insight generation
โ”œโ”€โ”€ utils/
โ”‚   โ”œโ”€โ”€ fuzzy_match.py        # Column matching
โ”‚   โ””โ”€โ”€ validator.py          # Input validation
โ”œโ”€โ”€ connectors/               # Data connectors
โ”œโ”€โ”€ api.py                    # Main API
โ”œโ”€โ”€ cli.py                    # CLI
โ”œโ”€โ”€ example.py                # Examples
โ”œโ”€โ”€ setup.py                  # Installation
โ””โ”€โ”€ requirements.txt          # Dependencies

๐Ÿ”„ Processing Pipeline

  1. Input Handler โ†’ Auto-detect and load data
  2. Schema Analyzer โ†’ Detect column types (numeric, categorical, datetime)
  3. Intent Engine โ†’ Parse query, extract visualization intent
  4. Data Processor โ†’ Apply filters, aggregations, transformations
  5. Visualizer โ†’ Create chart using selected library
  6. Insight Engine โ†’ Generate insights (trends, correlations, outliers)

๐ŸŽจ Visualization Libraries

Switch between visualization libraries:

# Matplotlib (default)
visualize("show data", "data.csv", library='matplotlib')

# Seaborn (statistical)
visualize("show data", "data.csv", library='seaborn')

# Plotly (interactive)
visualize("show data", "data.csv", library='plotly')

๐Ÿ’ก Auto Insights

Automatically generates insights:

result = visualize("show data", "data.csv")

# Access insights
insights = result['insights']

# {
#   'summary': {...},           # Basic stats
#   'trends': [...],            # Trend analysis
#   'correlations': [...],      # Correlations
#   'outliers': [...],          # Outlier detection
#   'top_bottom': {...},        # Top/bottom values
#   'distributions': {...}      # Distribution analysis
# }

๐Ÿ› ๏ธ Error Handling

Comprehensive error handling:

from nl2viz_pro import visualize

try:
    visualize("show data", "missing_file.csv")
except FileNotFoundError:
    print("File not found")

try:
    visualize("show invalid_column", "data.csv")
except ValueError:
    print("Column not found - suggestions available")

# Get column suggestions
viz = NL2VizPro()
viz.load_data("data.csv")
suggest = viz.suggest_column("revenu")  # Returns closest match

๐Ÿ“ˆ Advanced Usage

Custom Processing

from nl2viz_pro.core import DataProcessor, Intent Engine

viz = NL2VizPro()
viz.load_data("data.csv")

# Custom intent
intent = {
    'chart_type': 'bar',
    'x_column': 'Region',
    'y_column': 'Sales',
    'aggregation': 'sum',
    'filters': [{'column': 'Sales', 'operator': '>', 'value': '1000'}],
}

# Process data
df_processed = DataProcessor.process(viz.df, intent)

# Create visualization
from nl2viz_pro.core import Visualizer
fig = Visualizer.create(df_processed, 'bar', 'Region', 'Sales')
Visualizer.show(fig)

Multiple Queries on Same Data

viz = NL2VizPro()
viz.load_data("large_dataset.csv")

queries = [
    "show sales by region",
    "profit trend",
    "distribution of quantities",
    "correlation matrix",
]

for query in queries:
    result = viz.visualize(query, show=False, save=f"chart_{queries.index(query)}.png")

Batch Processing

import os
from pathlib import Path

viz = NL2VizPro()

for csv_file in Path("data/").glob("*.csv"):
    viz.load_data(csv_file)
    result = viz.visualize(
        "show summary",
        save=f"output/{csv_file.stem}.png"
    )

๐Ÿงช Testing

Run examples:

python example.py

This shows 7 complete examples covering:

  1. Basic usage
  2. Load and query multiple times
  3. DataFrame input
  4. Different libraries
  5. Advanced queries
  6. Schema inspection
  7. JSON data

๐Ÿ“‹ Requirements

  • Python 3.8+
  • pandas >= 1.3.0
  • numpy >= 1.20.0
  • matplotlib >= 3.5.0
  • seaborn >= 0.11.0
  • plotly >= 5.0.0
  • openpyxl >= 3.0.0 (Excel support)

๐Ÿš€ Performance

  • Handles datasets up to 1M rows
  • Smart sampling for large datasets
  • Efficient fuzzy matching
  • Optimized for common queries

๐Ÿ”ฎ Future Improvements

  • Gradio UI for web interface
  • Export charts to multiple formats
  • Dashboard mode with multiple visualizations
  • Custom color schemes
  • Caching for repeated queries
  • Multi-file analysis
  • SQL query generation
  • Advanced filtering UI
  • Recommendation engine
  • Integration with APIs

๐Ÿ“„ License

MIT License - See LICENSE file for details

๐Ÿ“ง Support

For issues, suggestions, or contributions:

๐Ÿ™Œ Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

Made with โค๏ธ for data visualization

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nl2viz_pro-1.0.0.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nl2viz_pro-1.0.0-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file nl2viz_pro-1.0.0.tar.gz.

File metadata

  • Download URL: nl2viz_pro-1.0.0.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for nl2viz_pro-1.0.0.tar.gz
Algorithm Hash digest
SHA256 43e24e9d6a76dc96e403f6448269b9a78d961e7f85a38cd8e92ebe125784ae2c
MD5 c924754157b38591b5bf2a508bb04917
BLAKE2b-256 92df750725ba61e0d29bad130c9a85fac9695e6f0b78f1121f3e82f3739f2e24

See more details on using hashes here.

File details

Details for the file nl2viz_pro-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: nl2viz_pro-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 23.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for nl2viz_pro-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a7b2809feac92f2ffeecbb9dfea6ca296d11b330de3c5d7508df955f5d7f36c6
MD5 10a5ab2b38d0cb49128745e591f6365a
BLAKE2b-256 ecf12237e53482b706f0a16065e45af90d39ac4824d7f1898388aaec17b3895c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page