Skip to main content

MaxCompute SQL Query Runner - Execute queries on Alibaba Cloud

Project description

๐Ÿš€ MaxQuery - MaxCompute SQL Query Runner

Python Version License

MaxQuery is a powerful, user-friendly command-line tool that makes it easy to execute SQL queries on Alibaba Cloud's MaxCompute (ODPS) platform. No more complex configurations or scriptingโ€”just run queries from your terminal!


๐Ÿ“‹ Table of Contents

  1. Features
  2. Installation
  3. Quick Start
  4. Configuration
  5. Commands Reference
  6. Usage Examples
  7. Output Formats
  8. Project Structure
  9. Troubleshooting
  10. Contributing
  11. License
  12. Author

โœจ Features

  • โœ… Easy Setup - One-time credential configuration
  • โœ… Multiple Output Formats - CSV (default) or Parquet
  • โœ… Flexible Query Execution - Single or batch queries
  • โœ… Custom Output Paths - Save results anywhere
  • โœ… Interactive CLI - User-friendly command-line interface
  • โœ… Credential Management - Secure local credential storage
  • โœ… Batch Processing - Run multiple queries at once
  • โœ… Cross-Platform - Works on Linux, macOS, and Windows

๐Ÿ“ฆ Installation

Option 1: From PyPI (Recommended)

pip install maxquery

Option 2: From Source (Development)

git clone https://github.com/chethanpatel/maxquery.git
cd maxquery
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e .

Requirements

  • Python 3.8 or higher
  • pip (Python package manager)

๐ŸŽฏ Quick Start

1. First-Time Setup (Interactive)

maxquery config --setup

You'll be prompted to enter:

  • Access ID - Your Alibaba Cloud MaxCompute access ID
  • Access Key - Your Alibaba Cloud MaxCompute access key
  • Project Name - Your MaxCompute project name
  • Endpoint - API endpoint (default provided)
  • Region - Your region (default: ap-southeast-5)

Your credentials are saved securely in ~/.maxquery/credentials.json

2. Run Your First Query

maxquery run queries/my_query.sql

Results are saved to outputs/ by default as CSV format.

3. Check Configuration

maxquery info

Shows your current credentials and configuration.


๐Ÿ” Configuration

Managing Credentials

Show current credentials:

maxquery config --show

Update credentials (interactive setup):

maxquery config --setup

Delete saved credentials:

maxquery config --delete

Credentials Location

Credentials are stored in: ~/.maxquery/credentials.json

โš ๏ธ Security Note: Never commit credentials to version control. The .maxquery/ directory is kept locally on your machine.


๐Ÿ“š Commands Reference

maxquery run - Execute Queries

Basic syntax:

maxquery run <sql_file> [sql_file2 ...] [OPTIONS]

Options:

Option Short Default Description
--format - 1 Output format: 1=CSV, 2=Parquet
--output -o outputs Output directory for results
--no-download - - Run query but don't save files
--help - - Show command help

maxquery config - Manage Credentials

Basic syntax:

maxquery config [OPTIONS]

Options:

Option Description
--setup Interactive credential setup
--show Display current credentials
--delete Delete saved credentials
--help Show command help

maxquery info - Show Configuration

Display current setup:

maxquery info

Shows:

  • Saved credentials location
  • Current project and region
  • Environment variables (if set)

maxquery --version - Check Version

maxquery --version

๐Ÿ’ก Usage Examples

Example 1: Simple Query Execution

# Run a single query, save as CSV in outputs/
maxquery run queries/user_analysis.sql

Output:

๐Ÿ“Š Running 1 query(ies)
   Format: CSV
============================================================
โœ… Connected to ODPS Project: my_project

๐Ÿ“„ user_analysis...
   โœ… 10500 records โ†’ outputs/user_analysis.csv

============================================================
โœ… Completed: 1/1 queries
๐Ÿ“Š Total records: 10500
๐Ÿ’พ Results saved to: outputs/

Example 2: Run Multiple Queries at Once

# Run all SQL files in a directory
maxquery run queries/production/*.sql

# Run specific queries
maxquery run queries/sales.sql queries/inventory.sql queries/customers.sql

Example 3: Save as Parquet Format

# Parquet format (better for large datasets)
maxquery run queries/large_dataset.sql --format 2

Output:

๐Ÿ“„ large_dataset...
   โœ… 5000000 records โ†’ outputs/large_dataset.parquet

Example 4: Custom Output Directory

# Save results to a specific folder
maxquery run queries/monthly_report.sql -o reports/2026/

# Absolute path
maxquery run queries/analysis.sql -o /home/user/data/exports/

# Current directory
maxquery run queries/test.sql -o .

Example 5: Run Without Saving Files

# Execute query but keep results in memory only
# Useful for testing or piping to other tools
maxquery run queries/validation.sql --no-download

Example 6: Batch Processing with Different Formats

# Run local test queries as CSV
maxquery run queries/local/*.sql --format 1 -o results/local/

# Run production queries as Parquet
maxquery run queries/production/*.sql --format 2 -o results/production/

Example 7: Complex Workflow

# 1. Setup credentials (first time only)
maxquery config --setup

# 2. Check configuration
maxquery info

# 3. Run test query
maxquery run queries/test_connection.sql

# 4. Run monthly reports
maxquery run queries/reports/monthly/*.sql --format 2 -o reports/2026-01/

# 5. Run analytics queries
maxquery run queries/analytics/user_metrics.sql -o analytics/ --format 2

๐Ÿ“Š Output Formats

CSV Format (Default - Format 1)

maxquery run queries/data.sql --format 1

Pros:

  • Human-readable
  • Works in spreadsheet applications (Excel, Google Sheets)
  • Good for small to medium datasets

Cons:

  • Larger file size for big data
  • Slower to read/write

Parquet Format (Format 2)

maxquery run queries/data.sql --format 2

Pros:

  • Highly compressed (smaller file size)
  • Faster read/write performance
  • Better for big data processing
  • Preserves data types

Cons:

  • Requires specialized tools to read
  • Not directly readable in Excel

Reading Parquet files in Python:

import pandas as pd

df = pd.read_parquet('outputs/data.parquet')
print(df.head())

๐Ÿ“ Project Structure

maxquery/
โ”œโ”€โ”€ maxquery/                    # Main package
โ”‚   โ”œโ”€โ”€ __init__.py             # Package initialization
โ”‚   โ”œโ”€โ”€ cli.py                  # Command-line interface
โ”‚   โ”œโ”€โ”€ core.py                 # Query execution logic
โ”‚   โ””โ”€โ”€ credentials.py          # Credential management
โ”œโ”€โ”€ queries/                     # SQL query files
โ”‚   โ”œโ”€โ”€ local/                  # Test/development queries
โ”‚   โ”‚   โ””โ”€โ”€ test_connection.sql
โ”‚   โ”œโ”€โ”€ production/             # Production queries
โ”‚   โ”‚   โ””โ”€โ”€ analytics.sql
โ”‚   โ””โ”€โ”€ schema/                 # Schema definitions
โ”œโ”€โ”€ outputs/                     # Query results (auto-created)
โ”œโ”€โ”€ setup.py                     # Package setup configuration
โ”œโ”€โ”€ pyproject.toml              # Project metadata
โ”œโ”€โ”€ requirements.txt            # Python dependencies
โ”œโ”€โ”€ README.md                   # This file
โ””โ”€โ”€ LICENSE                     # MIT License

๐Ÿ—‚๏ธ Query Organization Best Practices

Recommended Folder Structure

queries/
โ”œโ”€โ”€ local/                       # For testing/development
โ”‚   โ”œโ”€โ”€ test_connection.sql
โ”‚   โ””โ”€โ”€ data_validation.sql
โ”œโ”€โ”€ production/                  # For live queries
โ”‚   โ”œโ”€โ”€ daily/
โ”‚   โ”‚   โ”œโ”€โ”€ user_metrics.sql
โ”‚   โ”‚   โ””โ”€โ”€ sales_summary.sql
โ”‚   โ”œโ”€โ”€ weekly/
โ”‚   โ”‚   โ””โ”€โ”€ trend_analysis.sql
โ”‚   โ””โ”€โ”€ monthly/
โ”‚       โ””โ”€โ”€ business_report.sql
โ””โ”€โ”€ schema/                      # Table definitions & documentation
    โ”œโ”€โ”€ users_table.sql
    โ”œโ”€โ”€ orders_table.sql
    โ””โ”€โ”€ products_table.sql

Query File Naming Conventions

  • Use snake_case for file names: user_analysis.sql โœ…
  • Be descriptive: daily_sales_report.sql instead of report.sql โœ…
  • Group by frequency/type: daily_, weekly_, etc. โœ…

๐Ÿ”ง Advanced Usage

Running Queries from a Cron Job (Scheduled Execution)

Create a script run_daily_reports.sh:

#!/bin/bash
source ~/.maxquery_env/bin/activate
maxquery run /queries/production/daily/*.sql -o /data/reports/daily/

Add to crontab (runs daily at 2 AM):

0 2 * * * /home/user/scripts/run_daily_reports.sh

Processing Large Datasets

For very large results:

# Use Parquet format for better performance
maxquery run queries/huge_dataset.sql --format 2 -o big_data/

# Process with Python
import pandas as pd
df = pd.read_parquet('big_data/huge_dataset.parquet')
# Work with chunks for memory efficiency
for chunk in pd.read_parquet('big_data/huge_dataset.parquet', chunksize=10000):
    process(chunk)

Piping Output to Other Tools

# Convert results to JSON
maxquery run queries/data.sql --no-download | jq .

# Process with awk
maxquery run queries/data.sql -o - | awk '{print $1}'

๐Ÿ› Troubleshooting

Issue 1: "No credentials configured"

โŒ Error: No credentials configured
   Run: maxquery config --setup

Solution:

maxquery config --setup

Issue 2: "Invalid URL" or Connection Errors

โŒ Error: Invalid URL 'hello/tenants': No scheme supplied

Causes & Solutions:

  • Endpoint format is wrong โ†’ Use full URL: https://service.ap-southeast-5.maxcompute.aliyun.com/api
  • Credentials are incorrect โ†’ Verify with maxquery config --show
  • Network issue โ†’ Check internet connection

Fix:

maxquery config --setup
# Re-enter correct credentials

Issue 3: "SQL file not found"

โŒ Error: SQL file not found

Solution:

# Verify file exists
ls queries/my_query.sql

# Use correct path
maxquery run queries/my_query.sql  # โœ… Correct
maxquery run my_query.sql          # โŒ Wrong (file not in current dir)

Issue 4: "Permission denied" when saving results

โŒ Error: Permission denied when writing to outputs/

Solution:

# Check directory permissions
ls -la outputs/

# Create directory if needed
mkdir -p outputs/

# Use writable directory
maxquery run queries/data.sql -o ~/Downloads/results/

Issue 5: Out of Memory with Large Results

Solution 1: Use no-download mode

maxquery run queries/huge_query.sql --no-download

Solution 2: Use Parquet format (more efficient)

maxquery run queries/huge_query.sql --format 2

Solution 3: Process in chunks (Python)

import pandas as pd

# Read in chunks
for chunk in pd.read_parquet('outputs/huge_data.parquet', chunksize=50000):
    process_chunk(chunk)

๐Ÿ“– Getting Help

View all commands:

maxquery --help

Get help for specific command:

maxquery run --help
maxquery config --help
maxquery info --help

Check version:

maxquery --version

๐Ÿค Contributing

We welcome contributions! Here's how to contribute:

  1. Fork the repository

    git clone https://github.com/Chethanpatel/maxquery.git
    cd maxquery
    
  2. Create a branch

    git checkout -b feature/your-feature-name
    
  3. Make changes and test

    python -m pytest tests/
    
  4. Commit and push

    git add .
    git commit -m "Add your feature description"
    git push origin feature/your-feature-name
    
  5. Open a Pull Request on GitHub


๐Ÿš€ Publishing to PyPI

To publish a new version:

  1. Update version in setup.py and pyproject.toml
  2. Build package:
    pip install build twine
    python -m build
    
  3. Upload to PyPI:
    python -m twine upload dist/*
    

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ‘จโ€๐Ÿ’ป Author

Chethan Patel


๐Ÿ™ Acknowledgments

  • Built for Alibaba Cloud MaxCompute users
  • Inspired by the need for simple, efficient data query tools
  • Thanks to the Python community for excellent libraries

๐Ÿ“ž Support

For issues, questions, or suggestions:

  1. Check existing issues on GitHub Issues
  2. Create a new issue with:
    • Detailed description of the problem
    • Steps to reproduce
    • Error messages
    • Your environment info (OS, Python version)
  3. Email: chethanpatel100@gmail.com

๐Ÿ“Š Usage Statistics

Track your query usage:

# Count total queries run
ls -la outputs/ | wc -l

# Check latest results
ls -lt outputs/ | head -10

๐ŸŽ“ Learning Resources

MaxCompute/ODPS Documentation

Python & SQL Learning


Happy Querying! ๐ŸŽ‰

For the latest updates, visit: https://github.com/chethanpatel/maxquery

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maxquery-1.0.1.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

maxquery-1.0.1-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file maxquery-1.0.1.tar.gz.

File metadata

  • Download URL: maxquery-1.0.1.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for maxquery-1.0.1.tar.gz
Algorithm Hash digest
SHA256 6cbeb145d2dfad4fb3b62c14c24497475a4d6f48924e0fc121953f7376ed25a2
MD5 f904095093ae574c29aa6b800f6b3bd1
BLAKE2b-256 cabba66936756ec9ee17a8c940243d6cfdd2982857242cdfd7f6b51381485d95

See more details on using hashes here.

File details

Details for the file maxquery-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: maxquery-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for maxquery-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c4b145d2991e6e404502da4f89daf946f0816eca34a7a9dbe98a531922bbc7fd
MD5 664d1b655c8fef1dee5fe752613f8f1c
BLAKE2b-256 5ba3b5342c71fd10da8a65555002fbe13c8ff9a3044bd59684cf911c1ac1bbd3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page