Python DB API 2.0 (PEP 249) client for MongoDB
Project description
PyMongoSQL
PyMongoSQL is a Python DB API 2.0 (PEP 249) client for MongoDB. It provides a familiar SQL interface to MongoDB, allowing developers to use SQL to interact with MongoDB collections.
Objectives
PyMongoSQL implements the DB API 2.0 interfaces to provide SQL-like access to MongoDB, built on PartiQL syntax for querying semi-structured data. The project aims to:
- Bridge SQL and NoSQL: Provide SQL capabilities for MongoDB's nested document structures
- Standard SQL Operations: Support DQL (SELECT) and DML (INSERT, UPDATE, DELETE) operations with WHERE, ORDER BY, and LIMIT clauses
- Seamless Integration: Full compatibility with Python applications expecting DB API 2.0 compliance
- Easy Migration: Enable migration from traditional SQL databases to MongoDB without rewriting application code
Features
- DB API 2.0 Compliant: Full compatibility with Python Database API 2.0 specification
- PartiQL-based SQL Syntax: Built on PartiQL (SQL for semi-structured data), enabling seamless SQL querying of nested and hierarchical MongoDB documents
- Nested Structure Support: Query and filter deeply nested fields and arrays within MongoDB documents using standard SQL syntax
- MongoDB Aggregate Pipeline Support: Execute native MongoDB aggregation pipelines using SQL-like syntax with
aggregate()function - SQLAlchemy Integration: Complete ORM and Core support with dedicated MongoDB dialect
- SQL Query Support: SELECT statements with WHERE conditions, field selection, and aliases
- DML Support: Full support for INSERT, UPDATE, and DELETE operations using PartiQL syntax
- Connection String Support: MongoDB URI format for easy configuration
Requirements
- Python: 3.9, 3.10, 3.11, 3.12, 3.13+
- MongoDB: 7.0+
Dependencies
-
PyMongo (MongoDB Python Driver)
- pymongo >= 4.15.0
-
ANTLR4 (SQL Parser Runtime)
- antlr4-python3-runtime >= 4.13.0
-
JMESPath (JSON/Dict Path Query)
- jmespath >= 1.0.0
-
Tenacity (Transient Failure Retry)
- tenacity >= 9.0.0
Optional Dependencies
- SQLAlchemy (for ORM/Core support)
- sqlalchemy >= 1.4.0 (SQLAlchemy 1.4+ and 2.0+ supported)
Installation
pip install pymongosql
Or install from source:
git clone https://github.com/your-username/PyMongoSQL.git
cd PyMongoSQL
pip install -e .
Quick Start
Table of Contents:
- Basic Usage
- Using Connection String
- Context Manager Support
- Using DictCursor for Dictionary Results
- Cursor vs DictCursor
- Query with Parameters
- Supported SQL Features
- Apache Superset Integration
- Limitations & Roadmap
- Contributing
- License
Basic Usage
from pymongosql import connect
# Connect to MongoDB
connection = connect(
host="mongodb://localhost:27017",
database="database"
)
cursor = connection.cursor()
cursor.execute('SELECT name, email FROM users WHERE age > 25')
print(cursor.fetchall())
Using Connection String
from pymongosql import connect
# Connect with authentication
connection = connect(
host="mongodb://username:password@localhost:27017/database?authSource=admin"
)
cursor = connection.cursor()
cursor.execute('SELECT * FROM products WHERE category = ?', ['Electronics'])
for row in cursor:
print(row)
Context Manager Support
from pymongosql import connect
with connect(host="mongodb://localhost:27017/database") as conn:
with conn.cursor() as cursor:
cursor.execute('SELECT COUNT(*) as total FROM users')
result = cursor.fetchone()
print(f"Total users: {result[0]}")
Using DictCursor for Dictionary Results
from pymongosql import connect
from pymongosql.cursor import DictCursor
with connect(host="mongodb://localhost:27017/database") as conn:
with conn.cursor(DictCursor) as cursor:
cursor.execute('SELECT COUNT(*) as total FROM users')
result = cursor.fetchone()
print(f"Total users: {result['total']}")
Cursor vs DictCursor
PyMongoSQL provides two cursor types for different result formats:
Cursor (default) - Returns results as tuples:
cursor = connection.cursor()
cursor.execute('SELECT name, email FROM users')
row = cursor.fetchone()
print(row[0]) # Access by index
DictCursor - Returns results as dict:
from pymongosql.cursor import DictCursor
cursor = connection.cursor(DictCursor)
cursor.execute('SELECT name, email FROM users')
row = cursor.fetchone()
print(row['name']) # Access by column name
Query with Parameters
PyMongoSQL supports two styles of parameterized queries for safe value substitution:
Positional Parameters with ?
from pymongosql import connect
connection = connect(host="mongodb://localhost:27017/database")
cursor = connection.cursor()
cursor.execute(
'SELECT name, email FROM users WHERE age > ? AND status = ?',
[25, 'active']
)
Named Parameters with :name
from pymongosql import connect
connection = connect(host="mongodb://localhost:27017/database")
cursor = connection.cursor()
cursor.execute(
'SELECT name, email FROM users WHERE age > :age AND status = :status',
{'age': 25, 'status': 'active'}
)
Parameters are substituted into the MongoDB filter during execution, providing protection against injection attacks.
Retry on Transient System Errors
PyMongoSQL supports retrying transient, system-level MongoDB failures (for example connection timeout and reconnect errors) using Tenacity.
connection = connect(
host="mongodb://localhost:27017/database",
retry_enabled=True, # default: True
retry_attempts=3, # default: 3
retry_wait_min=0.1, # default: 0.1 seconds
retry_wait_max=1.0, # default: 1.0 seconds
)
These options apply to connection ping checks, query/DML command execution, and paginated getMore fetches.
Supported SQL Features
SELECT Statements
- Field selection:
SELECT name, age FROM users - Wildcards:
SELECT * FROM products - Field aliases:
SELECT name AS user_name, age AS user_age FROM users - Nested fields:
SELECT profile.name, profile.age FROM users - Array access:
SELECT items[0], items[1].name FROM orders
WHERE Clauses
- Equality:
WHERE name = 'John' - Comparisons:
WHERE age > 25,WHERE price <= 100.0 - Logical operators:
WHERE age > 18 AND status = 'active',WHERE age < 30 OR role = 'admin' - Nested field filtering:
WHERE profile.status = 'active' - Array filtering:
WHERE items[0].price > 100 - Value Functions: Apply transformations to values in WHERE clauses for filtering
Value Functions
PyMongoSQL supports value functions to transform and filter values in WHERE clauses. Built-in value functions include:
str_to_datetime() - Convert ISO 8601 or custom formatted strings to Python datetime objects
# ISO 8601 format
cursor.execute("SELECT * FROM events WHERE created_at >= str_to_datetime('2024-01-15T10:30:00Z')")
# Custom format
cursor.execute("SELECT * FROM events WHERE created_at < str_to_datetime('03/15/2024', '%m/%d/%Y')")
str_to_timestamp() - Convert ISO 8601 or custom formatted strings to BSON Timestamp objects
# ISO 8601 format
cursor.execute("SELECT * FROM logs WHERE timestamp > str_to_timestamp('2024-01-15T00:00:00Z')")
# Custom format
cursor.execute("SELECT * FROM logs WHERE timestamp < str_to_timestamp('01/15/2024', '%m/%d/%Y')")
Both functions:
- Support ISO 8601 strings with 'Z' timezone indicator
- Support custom format strings using Python strftime directives
- Return values with UTC timezone
- Can be combined with standard SQL operators (>, <, >=, <=, =, !=)
Nested Field Support
- Single-level:
profile.name,settings.theme - Multi-level:
account.profile.name,config.database.host - Array access:
items[0].name,orders[1].total - Complex queries:
WHERE customer.profile.age > 18 AND orders[0].status = 'paid'
Note: Avoid SQL reserved words (
user,data,value,count, etc.) as unquoted field names. Use alternatives names, or wrap them in double quotes if you must use them.
Sorting and Limiting
- ORDER BY:
ORDER BY name ASC, age DESC - LIMIT:
LIMIT 10 - Combined:
ORDER BY created_at DESC LIMIT 5
MongoDB Aggregate Function
PyMongoSQL supports executing native MongoDB aggregation pipelines using SQL-like syntax with the aggregate() function. This allows you to leverage MongoDB's powerful aggregation framework while maintaining SQL-style query patterns.
Syntax
The aggregate() function accepts two parameters:
- pipeline: JSON string representing the MongoDB aggregation pipeline
- options: JSON string for aggregation options (optional, use '{}' for defaults)
Qualified Aggregate (Collection-Specific)
cursor.execute(
"SELECT * FROM users.aggregate('[{\"$match\": {\"age\": {\"$gt\": 25}}}, {\"$group\": {\"_id\": \"$city\", \"count\": {\"$sum\": 1}}}]', '{}')"
)
results = cursor.fetchall()
Unqualified Aggregate (Database-Level)
cursor.execute(
"SELECT * FROM aggregate('[{\"$match\": {\"status\": \"active\"}}]', '{\"allowDiskUse\": true}')"
)
results = cursor.fetchall()
Post-Aggregation Filtering and Sorting
You can apply WHERE, ORDER BY, and LIMIT clauses after aggregation:
# Filter aggregation results
cursor.execute(
"SELECT * FROM users.aggregate('[{\"$group\": {\"_id\": \"$city\", \"total\": {\"$sum\": 1}}}]', '{}') WHERE total > 100"
)
# Sort and limit aggregation results
cursor.execute(
"SELECT * FROM products.aggregate('[{\"$match\": {\"category\": \"Electronics\"}}]', '{}') ORDER BY price DESC LIMIT 10"
)
Projection Support
# Select specific fields from aggregation results
cursor.execute(
"SELECT _id, total FROM users.aggregate('[{\"$group\": {\"_id\": \"$city\", \"total\": {\"$sum\": 1}}}]', '{}')"
)
Note: The pipeline and options must be valid JSON strings enclosed in single quotes. Post-aggregation filtering (WHERE), sorting (ORDER BY), and limiting (LIMIT) are applied in Python after the aggregation executes on MongoDB.
INSERT Statements
PyMongoSQL supports inserting documents into MongoDB collections using both PartiQL-style object literals and standard SQL INSERT VALUES syntax.
PartiQL-Style Object Literals
Single Document
cursor.execute(
"INSERT INTO Music {'title': 'Song A', 'artist': 'Alice', 'year': 2021}"
)
Multiple Documents (Bag Syntax)
cursor.execute(
"INSERT INTO Music << {'title': 'Song B', 'artist': 'Bob'}, {'title': 'Song C', 'artist': 'Charlie'} >>"
)
Parameterized INSERT
# Positional parameters using ? placeholders
cursor.execute(
"INSERT INTO Music {'title': '?', 'artist': '?', 'year': '?'}",
["Song D", "Diana", 2020]
)
Standard SQL INSERT VALUES
Single Row with Column List
cursor.execute(
"INSERT INTO Music (title, artist, year) VALUES ('Song E', 'Eve', 2022)"
)
Multiple Rows
cursor.execute(
"INSERT INTO Music (title, artist, year) VALUES ('Song F', 'Frank', 2023), ('Song G', 'Grace', 2024)"
)
Parameterized INSERT VALUES
# Positional parameters (?)
cursor.execute(
"INSERT INTO Music (title, artist, year) VALUES (?, ?, ?)",
["Song H", "Henry", 2025]
)
# Named parameters (:name)
cursor.execute(
"INSERT INTO Music (title, artist) VALUES (:title, :artist)",
{"title": "Song I", "artist": "Iris"}
)
UPDATE Statements
PyMongoSQL supports updating documents in MongoDB collections using standard SQL UPDATE syntax.
Update All Documents
cursor.execute("UPDATE Music SET available = false")
Update with WHERE Clause
cursor.execute("UPDATE Music SET price = 14.99 WHERE year < 2020")
Update Multiple Fields
cursor.execute(
"UPDATE Music SET price = 19.99, available = true WHERE artist = 'Alice'"
)
Update with Logical Operators
cursor.execute(
"UPDATE Music SET price = 9.99 WHERE year = 2020 AND stock > 5"
)
Parameterized UPDATE
# Positional parameters using ? placeholders
cursor.execute(
"UPDATE Music SET price = ?, stock = ? WHERE artist = ?",
[24.99, 50, "Bob"]
)
Update Nested Fields
cursor.execute(
"UPDATE Music SET details.publisher = 'XYZ Records' WHERE title = 'Song A'"
)
Check Updated Row Count
cursor.execute("UPDATE Music SET available = false WHERE year = 2020")
print(f"Updated {cursor.rowcount} documents")
DELETE Statements
PyMongoSQL supports deleting documents from MongoDB collections using standard SQL DELETE syntax.
Delete All Documents
cursor.execute("DELETE FROM Music")
Delete with WHERE Clause
cursor.execute("DELETE FROM Music WHERE year < 2020")
Delete with Logical Operators
cursor.execute(
"DELETE FROM Music WHERE year = 2019 AND available = false"
)
Parameterized DELETE
# Positional parameters using ? placeholders
cursor.execute(
"DELETE FROM Music WHERE artist = ? AND year < ?",
["Charlie", 2021]
)
Check Deleted Row Count
cursor.execute("DELETE FROM Music WHERE available = false")
print(f"Deleted {cursor.rowcount} documents")
Transaction Support
PyMongoSQL supports DB API 2.0 transactions for ACID-compliant database operations. Use the begin(), commit(), and rollback() methods to manage transactions:
from pymongosql import connect
connection = connect(host="mongodb://localhost:27017/database")
try:
connection.begin() # Start transaction
cursor = connection.cursor()
cursor.execute('UPDATE accounts SET balance = 100 WHERE id = ?', [1])
cursor.execute('UPDATE accounts SET balance = 200 WHERE id = ?', [2])
connection.commit() # Commit all changes
print("Transaction committed successfully")
except Exception as e:
connection.rollback() # Rollback on error
print(f"Transaction failed: {e}")
finally:
connection.close()
Note: MongoDB requires a replica set or sharded cluster for transaction support. Standalone MongoDB servers do not support ACID transactions at the server level.
Apache Superset Integration
PyMongoSQL can be used as a database driver in Apache Superset for querying and visualizing MongoDB data:
- Install PyMongoSQL: Install PyMongoSQL on the Superset app server:
pip install pymongosql
- Create Connection: Connect to your MongoDB instance using the connection URI with superset mode:
or for MongoDB Atlas:mongodb://username:password@host:port/database?mode=supersetmongodb+srv://username:password@host/database?mode=superset - Use SQL Lab: Write and execute SQL queries against MongoDB collections directly in Superset's SQL Lab
- Create Visualizations: Build charts and dashboards from your MongoDB queries using Superset's visualization tools
This allows seamless integration between MongoDB data and Superset's BI capabilities without requiring data migration to traditional SQL databases.
Important Note on Collection Names:
When using collection names containing special characters (., -, :), you must wrap them in double quotes to prevent Superset's SQL parser from incorrectly interpreting them.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
License
PyMongoSQL is distributed under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pymongosql-0.4.4.tar.gz.
File metadata
- Download URL: pymongosql-0.4.4.tar.gz
- Upload date:
- Size: 185.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f640c475408737aa5ddc1951a747eeb2d8e5d5913c9eeec0c377e74d46987598
|
|
| MD5 |
cd2d201275af4f314a9d013960a7f697
|
|
| BLAKE2b-256 |
8912f9dbbdcdaa69d5e90cd1e80fde63b60f6de3361a7693f4279f5e026a5028
|
File details
Details for the file pymongosql-0.4.4-py3-none-any.whl.
File metadata
- Download URL: pymongosql-0.4.4-py3-none-any.whl
- Upload date:
- Size: 189.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6223de55398a2e60e462a57f07011851dee39656a8c1f32bae98eb879e96e896
|
|
| MD5 |
e6186d9e4fa52896cc614a76d0ee10e7
|
|
| BLAKE2b-256 |
8b018f560612b0dbc6a295022d41dc8669d9fce2ae986088c683eede49f9c4d1
|