SQL MATCH_RECOGNIZE for Pandas DataFrames
Project description
SQL MATCH_RECOGNIZE on Pandas
Overview
This project brings SQLโs powerful MATCH_RECOGNIZE clauseโused for pattern matching in sequences and event streamsโdirectly to Pandas DataFrames. Our implementation allows users to run complex sequence detection logic in-memory within Python, removing the need for external databases like Trino, Oracle, or Flink.
It supports the SQL:2016 standard for MATCH_RECOGNIZE, including advanced features such as:
PARTITION BY,ORDER BY- Regex-style pattern syntax
DEFINEconditionsAFTER MATCH SKIPoptions- Support for anchors, quantifiers, alternation, and
PERMUTEpatterns
Motivation
Existing platforms like Oracle, Trino, and Flink offer robust implementations of MATCH_RECOGNIZE but come with significant complexity, licensing, or deployment overhead. Python's Pandas, despite its widespread use, lacks direct support for expressive pattern queries.
This project aims to close that gap by enabling SQL-native pattern detection in Pandas without sacrificing performance or expressiveness.
Key Features
-
๐ง SQL Query Parsing with ANTLR4 Fully customized SQL grammar extended from Trino to support all aspects of the
MATCH_RECOGNIZEclause. -
๐ฒ AST Construction SQL queries are parsed and transformed into abstract syntax trees for easier validation and execution.
-
โ๏ธ Finite Automata Engine
- Patterns are tokenized and translated to NFAs using Thompsonโs construction.
- NFAs are converted to DFAs for efficient row-by-row evaluation.
- DFA optimizations include state minimization and prioritization.
-
๐ Execution on Pandas
- Data is partitioned and ordered per query.
- Patterns are matched directly on DataFrames.
- Results are formatted to resemble SQL query output.
-
๐งช Safety and Expressiveness
- Custom error listener for precise SQL diagnostics.
- SQL-to-Python conversion uses the
astmodule to safely evaluate expressions.
Architecture
flowchart TD
SQL[SQL Query]
Parse[ANTLR4 Parser]
AST[AST Builder]
Tokenize[Pattern Tokenizer]
NFA[NFA Generator]
DFA[DFA Optimizer]
Executor[Match Executor]
Output[Final DataFrame Output]
SQL --> Parse --> AST --> Tokenize --> NFA --> DFA --> Executor --> Output
Example SQL Query
SELECT customer_id, start_price, bottom_price, final_price, start_date, final_date
FROM orders
MATCH_RECOGNIZE (
PARTITION BY customer_id
ORDER BY order_date
MEASURES
START.price AS start_price,
LAST(DOWN.price) AS bottom_price,
LAST(UP.price) AS final_price,
START.order_date AS start_date,
LAST(UP.order_date) AS final_date
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN+ UP+)
DEFINE
DOWN AS price < PREV(price),
UP AS price > PREV(price)
);
๐ Installation
Prerequisites
- Python 3.8+
- pandas >= 1.0.0
- numpy >= 1.18.0
- antlr4-python3-runtime >= 4.9.0
๏ฟฝ Install from PyPI (Production - Recommended)
Standard Installation:
pip install pandas-match-recognize
Upgrade to Latest Version:
pip install --upgrade pandas-match-recognize
Installation with Dependencies:
pip install pandas-match-recognize[all]
๐งช Install from TestPyPI (Testing Repository)
For testing the latest development version:
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ pandas-match-recognize
Upgrade from TestPyPI:
pip install --upgrade -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ pandas-match-recognize
๐ฆ Install from Source (Development)
-
Clone the repository:
git clone https://github.com/MonierAshraf/Row_match_recognize.git cd Row_match_recognize
-
Install dependencies:
pip install -r requirements.txt
-
Install in editable mode:
pip install -e .
๐ฅ Install from Local Build
# Build the package yourself
python -m build
pip install dist/pandas_match_recognize-0.1.0-py3-none-any.whl
โ Verify Installation
Quick Test (One Command):
python -c "from pandas_match_recognize import match_recognize; print('โ
Installation successful!')"
Comprehensive Test:
# Test both import methods - Both should work:
# Method 1: Package-aligned import (recommended)
try:
from pandas_match_recognize import match_recognize
print("โ
pandas_match_recognize import: SUCCESS")
except ImportError as e:
print(f"โ pandas_match_recognize import: FAILED - {e}")
# Method 2: Backward compatible import
try:
from match_recognize import match_recognize
print("โ
match_recognize import: SUCCESS")
except ImportError as e:
print(f"โ match_recognize import: FAILED - {e}")
# Test functionality
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['x', 'y', 'z']})
print("โ
pandas integration: SUCCESS")
print("๐ All tests passed! Package is ready to use.")
๐ง Installation Troubleshooting
Check Installation Source:
pip show pandas-match-recognize
Check which repositories have the package:
# Check PyPI
pip index versions pandas-match-recognize
# Check TestPyPI
pip index versions -i https://test.pypi.org/simple/ pandas-match-recognize
Force Reinstall:
pip uninstall pandas-match-recognize
pip install --no-cache-dir pandas-match-recognize
Install Specific Version:
pip install pandas-match-recognize==0.1.0
๐ก Quick Start Usage
Note: The package is installed as
pandas-match-recognizefrom PyPI. You can import it using two methods:
- Recommended:
from pandas_match_recognize import match_recognize(package-aligned)- Alternative:
from match_recognize import match_recognize(backward compatible)
Customer Order Pattern Analysis
# Import the match_recognize function (installed from pandas-match-recognize package)
from pandas_match_recognize import match_recognize # Recommended: package-aligned import
import pandas as pd
# Customer order data
data = [
('cust_1', '2020-05-11', 100),
('cust_1', '2020-05-12', 200),
('cust_2', '2020-05-13', 8),
('cust_1', '2020-05-14', 100),
('cust_2', '2020-05-15', 4),
('cust_1', '2020-05-16', 50),
('cust_1', '2020-05-17', 100),
('cust_2', '2020-05-18', 6),
]
# Create DataFrame
df = pd.DataFrame(data, columns=['customer_id', 'order_date', 'price'])
df['order_date'] = pd.to_datetime(df['order_date'])
# Find V-shaped price patterns: START โ DOWN+ โ UP+
sql = """
SELECT customer_id, start_price, bottom_price, final_price, start_date, final_date
FROM orders
MATCH_RECOGNIZE (
PARTITION BY customer_id
ORDER BY order_date
MEASURES
START.price AS start_price,
LAST(DOWN.price) AS bottom_price,
LAST(UP.price) AS final_price,
START.order_date AS start_date,
LAST(UP.order_date) AS final_date
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START DOWN+ UP+)
DEFINE
DOWN AS price < PREV(price),
UP AS price > PREV(price)
);
"""
# Execute the query
result = match_recognize(sql, df)
print(result)
Output:
customer_id start_price bottom_price final_price start_date final_date
0 cust_1 200 50 100 2020-05-12 2020-05-17
1 cust_2 8 4 6 2020-05-13 2020-05-18
๐ Development Setup
For Contributors
-
Fork and clone:
git fork https://github.com/MonierAshraf/Row_match_recognize.git git clone https://github.com/YOUR_USERNAME/Row_match_recognize.git cd Row_match_recognize
-
Create virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install development dependencies:
pip install -e . pip install -r test_requirements.txt # Testing dependencies
-
Run tests:
python -m pytest tests/test_anchor_patterns.py tests/test_back_reference.py tests/test_case_sensitivity.py tests/test_complete_java_reference.py tests/test_empty_cycle.py tests/test_empty_matches.py tests/test_exponential_protection.py tests/test_fixed_failing_cases.py tests/test_in_predicate.py tests/test_match_recognize.py tests/test_missing_critical_cases.py tests/test_multiple_match_recognize.py tests/test_navigation_and_conditions.py tests/test_output_layout.py tests/test_pattern_cache.py tests/test_pattern_tokenizer.py tests/test_permute_patterns.py tests/test_production_aggregates.py tests/test_scalar_functions.py tests/test_sql2016_compliance.py tests/test_subqueries.py --tb=short
๐ Updating & Deploying Changes
๐ Making Source Code Updates Available via pip
When you make changes to the source code and want to deploy them for pip installation:
1. Update Version Numbers
# Increment version in all files:
# - setup.py: version="0.1.1"
# - pyproject.toml: version = "0.1.1"
# - pandas_match_recognize/__init__.py: __version__ = "0.1.1"
# - match_recognize/__init__.py: __version__ = "0.1.1"
2. Build and Deploy
# Clean previous builds
rm -rf build/ dist/ *.egg-info/
# Build new version
python -m build
# Test locally (optional)
pip install dist/pandas_match_recognize-0.1.1-py3-none-any.whl --force-reinstall
# Upload to TestPyPI first (testing)
python -m twine upload --repository testpypi dist/*
# Upload to PyPI (production)
python -m twine upload dist/*
3. Users Install Updates
# Users can then get your updates:
pip install --upgrade pandas-match-recognize
# Or install specific version:
pip install pandas-match-recognize==0.1.1
๐ Version Strategy
- Patch (0.1.0 โ 0.1.1): Bug fixes, small improvements
- Minor (0.1.0 โ 0.2.0): New features (backward compatible)
- Major (0.1.0 โ 1.0.0): Breaking changes
๐ See
UPDATE_DEPLOYMENT_GUIDE.mdfor complete step-by-step instructions
๐๏ธ Uninstallation
๐ฏ Quick Uninstall (All Sources)
Standard Uninstall:
pip uninstall pandas-match-recognize
Force Uninstall (if standard doesn't work):
pip uninstall pandas-match-recognize -y
๐งน Complete Cleanup (Development/Multiple Installs)
Remove All Variations:
# Uninstall all possible package names
pip uninstall pandas-match-recognize row-match-recognize match-recognize -y
Remove Development/Editable Installations:
# For installations that show "Can't uninstall 'pandas-match-recognize'. No files were found to uninstall."
# This happens with development installs from source
# Step 1: Remove local development files
rm -rf pandas_match_recognize.egg-info/
rm -rf build/
rm -rf dist/
# Step 2: Find and remove from site-packages (if needed)
python -c "
import site
import os
site_packages = site.getsitepackages()[0]
dirs_to_remove = [
os.path.join(site_packages, 'match_recognize'),
os.path.join(site_packages, 'pandas_match_recognize'),
os.path.join(site_packages, 'pandas_match_recognize-0.1.0.dist-info')
]
for dir_path in dirs_to_remove:
if os.path.exists(dir_path):
print(f'Found: {dir_path}')
# Uncomment next line to actually remove:
# import shutil; shutil.rmtree(dir_path)
"
# Step 3: Manual removal (if above script found directories)
# rm -rf /path/to/site-packages/match_recognize
# rm -rf /path/to/site-packages/pandas_match_recognize
# rm -rf /path/to/site-packages/pandas_match_recognize*.dist-info
Clean Build Artifacts:
# Remove local build files (run in project directory)
rm -rf build/
rm -rf dist/
rm -rf *.egg-info/
rm -rf __pycache__/
rm -rf .pytest_cache/
Clear Pip Cache:
# Clear all pip cache
pip cache purge
# Remove specific cache entries
pip cache remove pandas-match-recognize
๐ Verify Uninstallation
Quick Verification:
python -c "
try:
import pandas_match_recognize
print('โ pandas_match_recognize still found')
except ImportError:
print('โ
pandas_match_recognize removed')
try:
import match_recognize
print('โ match_recognize still found')
except ImportError:
print('โ
match_recognize removed')
print('๐ฏ Uninstallation verification complete!')
"
Comprehensive Check:
# Check from different directory to avoid local imports
cd /tmp
python -c "
import sys
import subprocess
# Check if package is in pip list
result = subprocess.run(['pip', 'list'], capture_output=True, text=True)
if 'pandas-match-recognize' in result.stdout:
print('โ Package still in pip list')
else:
print('โ
Package not in pip list')
# Test imports
try:
from pandas_match_recognize import match_recognize
print('โ pandas_match_recognize import still works')
except ImportError:
print('โ
pandas_match_recognize import blocked')
try:
from match_recognize import match_recognize
print('โ match_recognize import still works')
except ImportError:
print('โ
match_recognize import blocked')
print('๐ Complete uninstallation verified!')
"
๐ง Uninstallation Troubleshooting
If Standard Uninstall Fails:
Error: "Can't uninstall 'pandas-match-recognize'. No files were found to uninstall."
# This happens with development/editable installations
# Solution 1: Remove local development files first
rm -rf pandas_match_recognize.egg-info/ build/ dist/
# Solution 2: Find installation type
pip show pandas-match-recognize
# Check if Location points to your project directory (development install)
# Solution 3: Manual removal from site-packages
# Find installation location
python -c "import pandas_match_recognize; print(pandas_match_recognize.__file__)" 2>/dev/null || echo "Package not found"
# Remove manually (replace with actual paths)
rm -rf /path/to/site-packages/pandas_match_recognize/
rm -rf /path/to/site-packages/match_recognize/
rm -rf /path/to/site-packages/pandas_match_recognize*.dist-info/
Multiple Installation Types:
# Check for different installation types
pip list | grep pandas-match-recognize # Check if still listed
pip show pandas-match-recognize # Check location type
# Development install (Location shows project directory)
# โ Remove .egg-info, build, dist directories from project
# Site-packages install (Location shows site-packages)
# โ Use standard pip uninstall
# Editable install (shows -e in pip list or has .egg-link)
# โ Remove .egg-link files manually
Multiple Python Environments:
# Check all Python environments
conda list | grep pandas-match-recognize # If using conda
pip list --user | grep pandas-match-recognize # User installs
sudo pip list | grep pandas-match-recognize # System installs
Reset to Clean State:
# Nuclear option - reinstall pip itself
python -m pip install --upgrade --force-reinstall pip
๐จ Specific Error Solutions
Error: "Can't uninstall 'pandas-match-recognize'. No files were found to uninstall."
This is the most common issue with mixed installations (wheel + development). Here's the exact solution that works:
# STEP 1: Remove from site-packages (if installed there)
python -c "
import site, os, glob
site_packages = site.getsitepackages()[0]
dirs_to_remove = [
os.path.join(site_packages, 'pandas_match_recognize'),
os.path.join(site_packages, 'match_recognize'),
os.path.join(site_packages, 'pandas_match_recognize-*.dist-info')
]
for pattern in dirs_to_remove:
for path in glob.glob(pattern):
print(f'Remove: {path}')
"
# Manually remove the directories shown above:
# rm -rf /path/to/site-packages/pandas_match_recognize
# rm -rf /path/to/site-packages/match_recognize
# rm -rf /path/to/site-packages/pandas_match_recognize-*.dist-info
# STEP 2: Remove development installation files
cd /path/to/Row_match_recognize # Go to your project directory
rm -rf pandas_match_recognize.egg-info/
rm -rf build/
rm -rf dist/
# STEP 3: Verify complete removal
pip show pandas-match-recognize # Should show "Package(s) not found"
# STEP 4: Test imports from outside project directory
cd /tmp
python -c "
try:
from pandas_match_recognize import match_recognize
print('โ Still installed')
except ImportError:
print('โ
pandas_match_recognize removed')
try:
from match_recognize import match_recognize
print('โ Still installed')
except ImportError:
print('โ
match_recognize removed')
print('๐ Complete uninstall verified!')
"
Error: Import works in project directory but not elsewhere
# This is expected behavior - local imports vs installed packages
# To test if package is truly uninstalled, always test from outside project directory
cd /tmp # or any directory outside your project
python -c "from pandas_match_recognize import match_recognize" # Should fail if uninstalled
Error: Package shows in pip list but can't uninstall
# Check installation type
pip show pandas-match-recognize
# If Location shows your project directory, it's a development install
# Remove manually
pip uninstall pandas-match-recognize --yes --break-system-packages 2>/dev/null || echo "Standard uninstall failed, using manual cleanup"
rm -rf $(python -c "import pandas_match_recognize; print(pandas_match_recognize.__file__.split('/__init__')[0])" 2>/dev/null)
๐งช Testing Installation & Functionality
๐ฏ Quick Functionality Test
Test Basic Import and Execution:
python -c "
from pandas_match_recognize import match_recognize
import pandas as pd
# Test data
df = pd.DataFrame({
'id': [1, 1, 1, 2, 2],
'value': [10, 20, 15, 5, 8],
'time': pd.date_range('2023-01-01', periods=5)
})
# Simple test query
sql = '''
SELECT id, value
FROM test_table
MATCH_RECOGNIZE (
PARTITION BY id
ORDER BY time
MEASURES FIRST(A.value) as first_val
ONE ROW PER MATCH
PATTERN (A)
DEFINE A AS value > 0
)
'''
try:
result = match_recognize(sql, df)
print('โ
Basic functionality test: PASSED')
print(f'๐ Result shape: {result.shape}')
except Exception as e:
print(f'โ Basic functionality test: FAILED - {e}')
"
๐ Test Both Installation Methods
Test PyPI vs TestPyPI Installation:
# Test PyPI installation
echo "๐งช Testing PyPI installation..."
pip uninstall pandas-match-recognize -y 2>/dev/null
pip install pandas-match-recognize
python -c "from pandas_match_recognize import match_recognize; print('โ
PyPI installation works')"
echo ""
echo "๐งช Testing TestPyPI installation..."
pip uninstall pandas-match-recognize -y 2>/dev/null
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ pandas-match-recognize
python -c "from pandas_match_recognize import match_recognize; print('โ
TestPyPI installation works')"
echo ""
echo "๐ Both installation methods verified!"
๐ Repository Comparison Test
Compare PyPI vs TestPyPI Versions:
python -c "
import subprocess
import json
def get_package_info(source='pypi'):
if source == 'pypi':
cmd = ['pip', 'show', 'pandas-match-recognize']
else:
# For TestPyPI, we need to check differently
cmd = ['pip', 'list', '--format=json']
try:
result = subprocess.run(cmd, capture_output=True, text=True)
if source == 'pypi':
print(f'๐ฆ {source.upper()} Package Info:')
print(result.stdout)
else:
packages = json.loads(result.stdout)
for pkg in packages:
if pkg['name'] == 'pandas-match-recognize':
print(f'๐ฆ {source.upper()} Version: {pkg[\"version\"]}')
break
except Exception as e:
print(f'โ Could not get {source} info: {e}')
get_package_info('pypi')
print()
get_package_info('testpypi')
"
๐ Performance Test
Test Pattern Matching Performance:
import time
import pandas as pd
from pandas_match_recognize import match_recognize
# Generate test data
print("๐ Performance Test")
n_rows = 1000
df = pd.DataFrame({
'customer_id': [f'cust_{i//100}' for i in range(n_rows)],
'order_date': pd.date_range('2023-01-01', periods=n_rows, freq='1H'),
'price': [10 + (i % 20) for i in range(n_rows)]
})
# Performance test query
sql = """
SELECT customer_id, COUNT(*) as pattern_count
FROM orders
MATCH_RECOGNIZE (
PARTITION BY customer_id
ORDER BY order_date
MEASURES FIRST(A.price) as start_price
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (A B+ C)
DEFINE
B AS price > PREV(price),
C AS price < PREV(price)
)
"""
start_time = time.time()
try:
result = match_recognize(sql, df)
end_time = time.time()
print(f"โ
Performance test completed")
print(f"๐ Processed {n_rows} rows in {end_time - start_time:.3f} seconds")
print(f"๐ Found {len(result)} pattern matches")
except Exception as e:
print(f"โ Performance test failed: {e}")
๐ Compatibility Test
Test Both Import Methods:
print("๐ Import Method Compatibility Test")
# Test Method 1: Package-aligned import (recommended)
try:
from pandas_match_recognize import match_recognize as mr1
print("โ
Method 1 (pandas_match_recognize): SUCCESS")
method1_success = True
except ImportError as e:
print(f"โ Method 1 (pandas_match_recognize): FAILED - {e}")
method1_success = False
# Test Method 2: Backward compatible import
try:
from match_recognize import match_recognize as mr2
print("โ
Method 2 (match_recognize): SUCCESS")
method2_success = True
except ImportError as e:
print(f"โ Method 2 (match_recognize): FAILED - {e}")
method2_success = False
# Test functionality equivalence
if method1_success and method2_success:
print("๐ Testing functional equivalence...")
import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
sql = "SELECT a FROM test MATCH_RECOGNIZE (ORDER BY a PATTERN (A) DEFINE A AS a > 0)"
try:
result1 = mr1(sql, df)
result2 = mr2(sql, df)
if result1.equals(result2):
print("โ
Both import methods produce identical results")
else:
print("โ ๏ธ Warning: Import methods produce different results")
except Exception as e:
print(f"โ Functional equivalence test failed: {e}")
print("๐ฏ Compatibility test completed")
๐ Repository Status Check
Check Package Status on Both Repositories:
echo "๐ Repository Status Check"
echo "=========================="
echo "๐ Checking PyPI status..."
curl -s "https://pypi.org/pypi/pandas-match-recognize/json" | python -c "
import json, sys
try:
data = json.load(sys.stdin)
print(f'โ
PyPI: pandas-match-recognize v{data[\"info\"][\"version\"]} available')
print(f'๐
Last updated: {data[\"releases\"][data[\"info\"][\"version\"]][0][\"upload_time\"]}')
except:
print('โ PyPI: Package not found or error')
"
echo ""
echo "๐ Checking TestPyPI status..."
curl -s "https://test.pypi.org/pypi/pandas-match-recognize/json" | python -c "
import json, sys
try:
data = json.load(sys.stdin)
print(f'โ
TestPyPI: pandas-match-recognize v{data[\"info\"][\"version\"]} available')
print(f'๐
Last updated: {data[\"releases\"][data[\"info\"][\"version\"]][0][\"upload_time\"]}')
except:
print('โ TestPyPI: Package not found or error')
"
echo ""
echo "๐ Direct URLs:"
echo " PyPI: https://pypi.org/project/pandas-match-recognize/"
echo " TestPyPI: https://test.pypi.org/project/pandas-match-recognize/"
๐ Troubleshooting
Common Issues
Multiple Import Options:
# โ
RECOMMENDED - Package-aligned import
from pandas_match_recognize import match_recognize
# โ
ALTERNATIVE - Backward compatible import
from match_recognize import match_recognize
# โ WRONG - Python doesn't allow hyphens in imports
from pandas-match-recognize import match_recognize # SyntaxError!
Import Error:
# If you get ModuleNotFoundError during development
import sys
import os
sys.path.append(os.path.join(os.getcwd(), 'src'))
from executor.match_recognize import match_recognize
# Or try the direct import methods:
# from pandas_match_recognize import match_recognize # Recommended
# from match_recognize import match_recognize # Alternative
Performance Issues:
- Limit dataset size to < 1000 rows for optimal performance
- Use specific
PARTITION BYclauses to reduce processing overhead - Avoid overly complex nested patterns with multiple quantifiers
Memory Issues:
# Monitor memory usage for large patterns
import psutil
print(f"Memory usage: {psutil.virtual_memory().percent}%")
Conclusion and Future Work
Current Limitations
Despite the system's comprehensive capabilities, several limitations remain. First, complex pattern and quantifier interactions: although the system supports concatenation, alternation, grouping, and standard quantifiers (*, +, ?, {n,m}), certain combinationsโparticularly multiple greedy quantifiers nested within groups (e.g., (A+B*)+C?)โcan trigger exponential state-space growth during automata construction. This issue primarily arises with three or more levels of nesting combined with unbounded quantifiers; by contrast, simpler patterns and bounded quantifiers behave efficiently. Second, limited support for aggregate functions: while a wide range of built-in aggregates (including conditional and statistical functions) is supported, the current implementation offers only limited support for user-defined aggregate functions.
Future Work
We plan to extend the engine through some updates:
Performance on Large Datasets: The system performs efficiently on moderate-sized datasets but may require additional optimizations for large datasets.
Memory Usage for Large Patterns: Patterns with many variables and complex quantifiers can generate large automata that increase memory consumption.
Integration with Query Optimizers: Because the pattern-matching engine currently operates independently of database query optimizers, it may miss plan-level optimization opportunities.
Conclusion
We presented a SQL-in-pandas engine for executing MATCH_RECOGNIZE queries over DataFrames. This provides SQL:2016 MATCH_RECOGNIZE functionality for pandas DataFrames, bridging the gap between the expressiveness of relational queries and the flexibility of in-memory analytics, bringing SQL pattern matching capabilities to Python data science workflows. This opens the door to unified and portable pipelines that preserve both semantics and developer productivity.
MATCH_RECOGNIZE allows data scientists and analysts to use powerful pattern-matching semantics directly within their familiar Pandas environment, without the need for complex Python code or external SQL engine dependencies. This reduces development complexity and enhances productivity for sequential data analysis across domains, including financial analysis, log processing, and time series pattern detection.
By addressing the identified limitations and implementing the future enhancements, our goal is to develop a more adaptable and efficient solution that can handle complex pattern-matching scenarios across various data processing environments. Future work will focus on addressing current limitations through enhanced SQL clause support, distributed processing capabilities, and advanced analytics integration. This development roadmap outlined above provides a clear path for improving performance for the current implementation.
๐ References
๐ค Contributing
Pull requests and feedback are welcome! Please ensure your code is tested and documented.
๐ License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pandas_match_recognize-0.1.7.tar.gz.
File metadata
- Download URL: pandas_match_recognize-0.1.7.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bb02b89d8e578d16f10c2496f6f046980e0be0100f01480ba3aaa5d838f325a
|
|
| MD5 |
248fde482e61f08bc0564d52fb676949
|
|
| BLAKE2b-256 |
f280700ff05b0df253444d78baf776f8f9600351ea297f1401e14dd522c29087
|
File details
Details for the file pandas_match_recognize-0.1.7-py3-none-any.whl.
File metadata
- Download URL: pandas_match_recognize-0.1.7-py3-none-any.whl
- Upload date:
- Size: 560.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fae7f8a4b5c66e1af49473c110fe9fb9c93cb274210f97872e6327547276f64f
|
|
| MD5 |
9aa1a0376553ac709503d769215e5e7d
|
|
| BLAKE2b-256 |
feffdb75a436b7d491e5ca2da4b20c59586cccffd6e83835398515d680dfd88e
|