MaxCompute SQL Query Runner - Execute queries on Alibaba Cloud
Project description
๐ MaxQuery - MaxCompute SQL Query Runner
MaxQuery is a powerful, user-friendly command-line tool that makes it easy to execute SQL queries on Alibaba Cloud's MaxCompute (ODPS) platform. No more complex configurations or scriptingโjust run queries from your terminal!
๐ Table of Contents
- Features
- Installation
- Quick Start
- Configuration
- Commands Reference
- Usage Examples
- Output Formats
- Project Structure
- Troubleshooting
- Contributing
- License
- Author
โจ Features
- โ Easy Setup - One-time credential configuration
- โ Multiple Output Formats - CSV (default) or Parquet
- โ Flexible Query Execution - Single or batch queries
- โ Custom Output Paths - Save results anywhere
- โ Interactive CLI - User-friendly command-line interface
- โ Credential Management - Secure local credential storage
- โ Batch Processing - Run multiple queries at once
- โ Cross-Platform - Works on Linux, macOS, and Windows
๐ฆ Installation
Option 1: From PyPI (Recommended)
pip install maxquery
Option 2: From Source (Development)
git clone https://github.com/chethanpatel/maxquery.git
cd maxquery
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e .
Requirements
- Python 3.8 or higher
- pip (Python package manager)
๐ฏ Quick Start
1. First-Time Setup (Interactive)
maxquery config --setup
You'll be prompted to enter:
- Access ID - Your Alibaba Cloud MaxCompute access ID
- Access Key - Your Alibaba Cloud MaxCompute access key
- Project Name - Your MaxCompute project name
- Endpoint - API endpoint (default provided)
- Region - Your region (default: ap-southeast-5)
Your credentials are saved securely in ~/.maxquery/credentials.json
2. Run Your First Query
maxquery run queries/my_query.sql
Results are saved to outputs/ by default as CSV format.
3. Check Configuration
maxquery info
Shows your current credentials and configuration.
๐ Configuration
Managing Credentials
Show current credentials:
maxquery config --show
Update credentials (interactive setup):
maxquery config --setup
Delete saved credentials:
maxquery config --delete
Credentials Location
Credentials are stored in: ~/.maxquery/credentials.json
โ ๏ธ Security Note: Never commit credentials to version control. The .maxquery/ directory is kept locally on your machine.
๐ Commands Reference
maxquery run - Execute Queries
Basic syntax:
maxquery run <sql_file> [sql_file2 ...] [OPTIONS]
Options:
| Option | Short | Default | Description |
|---|---|---|---|
--format |
- | 1 |
Output format: 1=CSV, 2=Parquet |
--output |
-o |
outputs |
Output directory for results |
--no-download |
- | - | Run query but don't save files |
--help |
- | - | Show command help |
maxquery config - Manage Credentials
Basic syntax:
maxquery config [OPTIONS]
Options:
| Option | Description |
|---|---|
--setup |
Interactive credential setup |
--show |
Display current credentials |
--delete |
Delete saved credentials |
--help |
Show command help |
maxquery info - Show Configuration
Display current setup:
maxquery info
Shows:
- Saved credentials location
- Current project and region
- Environment variables (if set)
maxquery --version - Check Version
maxquery --version
๐ก Usage Examples
Example 1: Simple Query Execution
# Run a single query, save as CSV in outputs/
maxquery run queries/user_analysis.sql
Output:
๐ Running 1 query(ies)
Format: CSV
============================================================
โ
Connected to ODPS Project: my_project
๐ user_analysis...
โ
10500 records โ outputs/user_analysis.csv
============================================================
โ
Completed: 1/1 queries
๐ Total records: 10500
๐พ Results saved to: outputs/
Example 2: Run Multiple Queries at Once
# Run all SQL files in a directory
maxquery run queries/production/*.sql
# Run specific queries
maxquery run queries/sales.sql queries/inventory.sql queries/customers.sql
Example 3: Save as Parquet Format
# Parquet format (better for large datasets)
maxquery run queries/large_dataset.sql --format 2
Output:
๐ large_dataset...
โ
5000000 records โ outputs/large_dataset.parquet
Example 4: Custom Output Directory
# Save results to a specific folder
maxquery run queries/monthly_report.sql -o reports/2026/
# Absolute path
maxquery run queries/analysis.sql -o /home/user/data/exports/
# Current directory
maxquery run queries/test.sql -o .
Example 5: Run Without Saving Files
# Execute query but keep results in memory only
# Useful for testing or piping to other tools
maxquery run queries/validation.sql --no-download
Example 6: Batch Processing with Different Formats
# Run local test queries as CSV
maxquery run queries/local/*.sql --format 1 -o results/local/
# Run production queries as Parquet
maxquery run queries/production/*.sql --format 2 -o results/production/
Example 7: Complex Workflow
# 1. Setup credentials (first time only)
maxquery config --setup
# 2. Check configuration
maxquery info
# 3. Run test query
maxquery run queries/test_connection.sql
# 4. Run monthly reports
maxquery run queries/reports/monthly/*.sql --format 2 -o reports/2026-01/
# 5. Run analytics queries
maxquery run queries/analytics/user_metrics.sql -o analytics/ --format 2
๐ Output Formats
CSV Format (Default - Format 1)
maxquery run queries/data.sql --format 1
Pros:
- Human-readable
- Works in spreadsheet applications (Excel, Google Sheets)
- Good for small to medium datasets
Cons:
- Larger file size for big data
- Slower to read/write
Parquet Format (Format 2)
maxquery run queries/data.sql --format 2
Pros:
- Highly compressed (smaller file size)
- Faster read/write performance
- Better for big data processing
- Preserves data types
Cons:
- Requires specialized tools to read
- Not directly readable in Excel
Reading Parquet files in Python:
import pandas as pd
df = pd.read_parquet('outputs/data.parquet')
print(df.head())
๐ Project Structure
maxquery/
โโโ maxquery/ # Main package
โ โโโ __init__.py # Package initialization
โ โโโ cli.py # Command-line interface
โ โโโ core.py # Query execution logic
โ โโโ credentials.py # Credential management
โโโ queries/ # SQL query files
โ โโโ local/ # Test/development queries
โ โ โโโ test_connection.sql
โ โโโ production/ # Production queries
โ โ โโโ analytics.sql
โ โโโ schema/ # Schema definitions
โโโ outputs/ # Query results (auto-created)
โโโ setup.py # Package setup configuration
โโโ pyproject.toml # Project metadata
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
โโโ LICENSE # MIT License
๐๏ธ Query Organization Best Practices
Recommended Folder Structure
queries/
โโโ local/ # For testing/development
โ โโโ test_connection.sql
โ โโโ data_validation.sql
โโโ production/ # For live queries
โ โโโ daily/
โ โ โโโ user_metrics.sql
โ โ โโโ sales_summary.sql
โ โโโ weekly/
โ โ โโโ trend_analysis.sql
โ โโโ monthly/
โ โโโ business_report.sql
โโโ schema/ # Table definitions & documentation
โโโ users_table.sql
โโโ orders_table.sql
โโโ products_table.sql
Query File Naming Conventions
- Use snake_case for file names:
user_analysis.sqlโ - Be descriptive:
daily_sales_report.sqlinstead ofreport.sqlโ - Group by frequency/type:
daily_,weekly_, etc. โ
๐ง Advanced Usage
Running Queries from a Cron Job (Scheduled Execution)
Create a script run_daily_reports.sh:
#!/bin/bash
source ~/.maxquery_env/bin/activate
maxquery run /queries/production/daily/*.sql -o /data/reports/daily/
Add to crontab (runs daily at 2 AM):
0 2 * * * /home/user/scripts/run_daily_reports.sh
Processing Large Datasets
For very large results:
# Use Parquet format for better performance
maxquery run queries/huge_dataset.sql --format 2 -o big_data/
# Process with Python
import pandas as pd
df = pd.read_parquet('big_data/huge_dataset.parquet')
# Work with chunks for memory efficiency
for chunk in pd.read_parquet('big_data/huge_dataset.parquet', chunksize=10000):
process(chunk)
Piping Output to Other Tools
# Convert results to JSON
maxquery run queries/data.sql --no-download | jq .
# Process with awk
maxquery run queries/data.sql -o - | awk '{print $1}'
๐ Troubleshooting
Issue 1: "No credentials configured"
โ Error: No credentials configured
Run: maxquery config --setup
Solution:
maxquery config --setup
Issue 2: "Invalid URL" or Connection Errors
โ Error: Invalid URL 'hello/tenants': No scheme supplied
Causes & Solutions:
- Endpoint format is wrong โ Use full URL:
https://service.ap-southeast-5.maxcompute.aliyun.com/api - Credentials are incorrect โ Verify with
maxquery config --show - Network issue โ Check internet connection
Fix:
maxquery config --setup
# Re-enter correct credentials
Issue 3: "SQL file not found"
โ Error: SQL file not found
Solution:
# Verify file exists
ls queries/my_query.sql
# Use correct path
maxquery run queries/my_query.sql # โ
Correct
maxquery run my_query.sql # โ Wrong (file not in current dir)
Issue 4: "Permission denied" when saving results
โ Error: Permission denied when writing to outputs/
Solution:
# Check directory permissions
ls -la outputs/
# Create directory if needed
mkdir -p outputs/
# Use writable directory
maxquery run queries/data.sql -o ~/Downloads/results/
Issue 5: Out of Memory with Large Results
Solution 1: Use no-download mode
maxquery run queries/huge_query.sql --no-download
Solution 2: Use Parquet format (more efficient)
maxquery run queries/huge_query.sql --format 2
Solution 3: Process in chunks (Python)
import pandas as pd
# Read in chunks
for chunk in pd.read_parquet('outputs/huge_data.parquet', chunksize=50000):
process_chunk(chunk)
๐ Getting Help
View all commands:
maxquery --help
Get help for specific command:
maxquery run --help
maxquery config --help
maxquery info --help
Check version:
maxquery --version
๐ค Contributing
We welcome contributions! Here's how to contribute:
-
Fork the repository
git clone https://github.com/chethanpatel/maxquery.git cd maxquery
-
Create a branch
git checkout -b feature/your-feature-name
-
Make changes and test
python -m pytest tests/
-
Commit and push
git add . git commit -m "Add your feature description" git push origin feature/your-feature-name
-
Open a Pull Request on GitHub
๐ Publishing to PyPI
To publish a new version:
- Update version in
setup.pyandpyproject.toml - Build package:
pip install build twine python -m build
- Upload to PyPI:
python -m twine upload dist/*
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐จโ๐ป Author
Chethan Patel
- Email: chethanpatel100@gmail.com
- GitHub: @chethanpatel
- LinkedIn: Chethan Patel
๐ Acknowledgments
- Built for Alibaba Cloud MaxCompute users
- Inspired by the need for simple, efficient data query tools
- Thanks to the Python community for excellent libraries
๐ Support
For issues, questions, or suggestions:
- Check existing issues on GitHub Issues
- Create a new issue with:
- Detailed description of the problem
- Steps to reproduce
- Error messages
- Your environment info (OS, Python version)
- Email: chethanpatel100@gmail.com
๐ Usage Statistics
Track your query usage:
# Count total queries run
ls -la outputs/ | wc -l
# Check latest results
ls -lt outputs/ | head -10
๐ Learning Resources
MaxCompute/ODPS Documentation
Python & SQL Learning
Happy Querying! ๐
For the latest updates, visit: https://github.com/chethanpatel/maxquery
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file maxquery-1.0.0.tar.gz.
File metadata
- Download URL: maxquery-1.0.0.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d34c6c340aefeac2d94526e00749dfde69e237290f96ca6c8bac4879c7b80f86
|
|
| MD5 |
c988d85761ffeaadd2ec1ac9793f4095
|
|
| BLAKE2b-256 |
12142b114d09bba1f932ed916ed6d2304e3d586794ee81d97ca87939d3692009
|
File details
Details for the file maxquery-1.0.0-py3-none-any.whl.
File metadata
- Download URL: maxquery-1.0.0-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
147b2c0ee498b1aafe508d549b163228faefb1af1eb5eff0d7f676d3b8af40dc
|
|
| MD5 |
58f4b4e2dce67cede050d6331938dc63
|
|
| BLAKE2b-256 |
4fed7f95a4a39931c8cc9e3ba25543ae9c07cbdbb64a653281db8d28fe26098f
|