Skip to main content

Python-based statistical scripting language with Jupyter notebook support

Project description

StatLang

StatLang Logo

An open-source, Python-based statistical scripting language

Write and run statistical scripts with full syntax highlighting and a Python backend.

Overview

StatLang provides an open-source environment for statistical analysis by offering:

  • Expressive scripting syntax for data manipulation and analysis
  • Python backend for execution and performance
  • Jupyter notebook support with a StatLang kernel
  • VS Code extension with syntax highlighting and execution
  • Cross-platform compatibility (Windows, macOS, Linux)
  • Open source and free to use

๐ŸŒŸ What Makes StatLang Special?

  • ๐Ÿค– AI Integration: Built-in PROC LANGUAGE with LLM capabilities for intelligent data analysis
  • ๐Ÿง  Complete ML Pipeline: From data exploration to model deployment using familiar, concise syntax
  • ๐Ÿ’พ Modern SQL: PROC SQL powered by DuckDB for high-performance data querying
  • ๐Ÿ”ง Robust language features: Macro system, format system, and statistical procedures
  • ๐Ÿ“Š Rich Visualizations: Professional output formatting with TITLE statements and structured results

Features

Core Interpreter

  • Scripting-based DATA step functionality with inline data support
  • Statistical procedures (MEANS, FREQ, SORT, PRINT)
  • Concise data manipulation and analysis syntax
  • Python pandas/numpy backend for performance
  • Clean, professional output with familiar formatting

Jupyter Notebook Support

  • StatLang kernel for Jupyter notebooks

  • Interactive statistical programming in notebook environment

  • Rich output display with formatted tables

  • Dataset visualization and exploration

  • VS Code Extension

  • Syntax highlighting for .statlang files

  • Code snippets for common statistical analysis patterns

  • File execution directly from VS Code

  • Notebook support for interactive analysis

Supported Features

๐Ÿ“Š Statistical Procedures

  • PROC MEANS: Descriptive statistics with CLASS variables and OUTPUT statements
  • PROC FREQ: Frequency tables and cross-tabulations with options
  • PROC SORT: Data sorting with ascending/descending order
  • PROC PRINT: Data display and formatting
  • PROC REG: Linear regression analysis with MODEL, OUTPUT, and SCORE statements
  • PROC UNIVARIATE: Detailed univariate analysis with distribution diagnostics
  • PROC CORR: Correlation analysis (Pearson, Spearman)
  • PROC FACTOR: Principal component analysis and factor analysis
  • PROC CLUSTER: Clustering methods (k-means, hierarchical)
  • PROC NPAR1WAY: Nonparametric tests (Mann-Whitney, Kruskal-Wallis)
  • PROC TTEST: T-tests (independent and paired)
  • PROC LOGIT: Logistic regression modeling
  • PROC TIMESERIES: Time series analysis and seasonal decomposition
  • PROC SURVEYSELECT: Random sampling with SRS method, SAMPRATE/N options, and OUTALL flag

๐Ÿค– Machine Learning Procedures

  • PROC TREE: Decision trees for classification and regression
  • PROC FOREST: Random forests for ensemble learning
  • PROC BOOST: Gradient boosting for advanced modeling

๐Ÿ’ป Advanced Features

  • PROC SQL: SQL query processing with DuckDB backend
  • PROC LANGUAGE: Built-in LLM integration for text generation, Q&A, and data analysis
  • Macro System: Complete macro facility with %MACRO/%MEND, %LET, & substitution, %PUT, %IF/%THEN/%ELSE, %DO/%END
  • Format System: Built-in date/time, numeric, and currency formats with metadata persistence
  • TITLE Statements: Professional output formatting

๐Ÿ”ง Core Data Processing

  • DATA Steps: Variable creation, conditional logic, DATALINES input
  • Macro variables: %LET, %PUT statements
  • Libraries: LIBNAME functionality
  • NOPRINT option: Silent execution for procedures

Installation

Python Package

pip install statlang

Jupyter Kernel Installation

# Install the StatLang kernel
python -m statlang.kernel install

# List available kernels
jupyter kernelspec list

VS Code Extension

  1. Install from VS Code Marketplace: "StatLang" by RyanBlakeStory
  2. Or install from source (see Development section)

๐Ÿš€ Exciting New Features

๐Ÿค– LANGUAGE - AI-Powered Analysis

language prompt="Analyze the correlation between income and spending in our dataset";
run;

Built-in LLM integration for text generation, Q&A, and intelligent data analysis using Hugging Face transformers!

๐Ÿง  Complete Machine Learning Workflow

Check out our ML Project Demo - a comprehensive regression analysis project showcasing:

  • PROC UNIVARIATE for distribution exploration
  • PROC SURVEYSELECT for train/test splitting
  • PROC REG with MODEL, OUTPUT, and SCORE statements
  • Macro system for reusable analysis workflows
  • Complete ML pipeline in pure StatLang syntax

๐Ÿ’พ SQL - Modern Data Querying

sql;
  select age, income, spend,
         case when income > 60000 then 'High' else 'Low' end as income_group
  from work.customers
  where age between 25 and 50
  order by income desc;
quit;

DuckDB-powered SQL processing with full dataset integration!

Quick Start

1. Interactive Python Usage

from statlang import StatLangInterpreter

# Create interpreter
interpreter = StatLangInterpreter()

# Create sample data using StatLang syntax
interpreter.run_code('''
data work.employees;
    input employee_id name $ department $ salary;
    datalines;
1 Alice Engineering 75000
2 Bob Marketing 55000
3 Carol Engineering 80000
4 David Sales 45000
;
run;
''')

# Run statistical analysis
interpreter.run_code('''
proc means data=work.employees;
    class department;
    var salary;
run;
''')

2. Jupyter Notebook Usage

  1. Install the StatLang kernel:
    python -m statlang.kernel install
    
  2. Create a new Jupyter notebook (.ipynb)
  3. Select "statlang" as the kernel
  4. Write StatLang code in cells and execute

3. VS Code Usage

  1. Install the StatLang extension from the marketplace
  2. Create a new file with .statlang extension
  3. Write your StatLang code
  4. Use Ctrl+Shift+P โ†’ "StatLang: Run File" to execute

4. Command Line Usage

# Run StatLang code from file
python -m statlang.cli run example.statlang

# Interactive mode
python -m statlang.cli interactive

๐Ÿ“š Examples & Demos

๐ŸŽฏ Complete ML Project

ML Project Demo - A comprehensive machine learning workflow:

  • Synthetic dataset creation with 30 observations
  • PROC UNIVARIATE for distribution analysis
  • PROC SURVEYSELECT for train/test splitting (70/30)
  • PROC REG with MODEL, OUTPUT, and SCORE statements
  • Macro-based reusable analysis functions
  • Complete regression analysis pipeline

๐Ÿ“Š Comprehensive Walkthrough

StatLang Walkthrough - Complete feature demonstration:

  • All statistical procedures with examples
  • Macro system demonstrations
  • Format system usage
  • Advanced data manipulation techniques
  • Real-world analysis scenarios

Project Structure

StatLang/
โ”œโ”€โ”€ stat_lang/                # Core Python package
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ interpreter.py        # Main statistical interpreter
โ”‚   โ”œโ”€โ”€ cli.py               # Command line interface
โ”‚   โ”œโ”€โ”€ kernel/              # Jupyter kernel implementation
โ”‚   โ”‚   โ”œโ”€โ”€ statlang_kernel.py   # Main kernel
โ”‚   โ”‚   โ””โ”€โ”€ install.py       # Kernel installation
โ”‚   โ”œโ”€โ”€ parser/              # Syntax parser
โ”‚   โ”‚   โ”œโ”€โ”€ data_step_parser.py
โ”‚   โ”‚   โ”œโ”€โ”€ proc_parser.py
โ”‚   โ”‚   โ””โ”€โ”€ macro_parser.py
โ”‚   โ”œโ”€โ”€ procs/               # Statistical procedure implementations
โ”‚   โ”‚   โ”œโ”€โ”€ proc_means.py
โ”‚   โ”‚   โ”œโ”€โ”€ proc_freq.py
โ”‚   โ”‚   โ”œโ”€โ”€ proc_sort.py
โ”‚   โ”‚   โ””โ”€โ”€ proc_print.py
โ”‚   โ””โ”€โ”€ utils/               # Utility functions
โ”‚       โ”œโ”€โ”€ expression_evaluator.py
โ”‚       โ”œโ”€โ”€ data_utils.py
โ”‚       โ””โ”€โ”€ libname_manager.py
โ”œโ”€โ”€ vscode-extension/         # VS Code extension
โ”œโ”€โ”€ examples/                # Example files and demo notebook
โ”œโ”€โ”€ media/                   # Logo and icons
โ”œโ”€โ”€ setup.py                 # Package setup
โ””โ”€โ”€ README.md

Development

Setup Development Environment

git clone https://github.com/ryan-story/StatLang.git
cd StatLang
pip install -e .

Running Tests

# Run basic functionality tests
python -c "from statlang import StatLangInterpreter; print('StatLang loaded successfully')"

Key Features Implemented

โœ… Completed Features

  • Core DATA step implementation with DATALINES
  • Statistical procedures with CLASS variables and OUTPUT statements
  • Frequency analysis with cross-tabulations and options
  • Data sorting with ascending/descending order
  • Data display and formatting
  • Linear regression analysis with PROC REG
  • Random sampling with PROC SURVEYSELECT
  • Silent execution options
  • Jupyter notebook kernel
  • VS Code extension with syntax highlighting
  • Clean, professional output
  • Concise behavior and syntax

๐Ÿšง Future Enhancements

  • Additional statistical procedures (SQL queries, advanced regression, etc.)
  • Advanced macro functionality
  • Performance optimizations
  • Enhanced data connectivity options

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Areas for Contribution

  • Additional statistical procedures
  • Macro functionality enhancements
  • Performance optimizations
  • VS Code extension features
  • Documentation and examples

License

MIT License - see LICENSE for details.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

statlang-0.1.3.tar.gz (80.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

statlang-0.1.3-py3-none-any.whl (103.6 kB view details)

Uploaded Python 3

File details

Details for the file statlang-0.1.3.tar.gz.

File metadata

  • Download URL: statlang-0.1.3.tar.gz
  • Upload date:
  • Size: 80.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for statlang-0.1.3.tar.gz
Algorithm Hash digest
SHA256 7602d44b23999959ea2709427202783632c9a15abdb9a693cee92eed270fee7b
MD5 b16e0f51099320a8420e81fa993f8246
BLAKE2b-256 5b75d305d3a2b0dc8e791f9022e39eacb3d9770e4e6f03052d5f1dbdea9862cd

See more details on using hashes here.

File details

Details for the file statlang-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: statlang-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 103.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for statlang-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 03949186592ea94c30b081bcb14e84970ebe96c8eeaec7934a0569b67cb523a6
MD5 0b03abb6bb2c1eaf8b8fe81d77088bb7
BLAKE2b-256 3be3241e99b7e78aa988227de916d069a7e2a58d38519fb8b0e96e2fca566e7d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page