This project analyses SQL statements and labels possible errors or complications.
Reason this release was yanked:
renamed labels
Project description
SQL Error Taxonomy
This repository provides a SQL error taxonomy aimed at supporting uniform error classification in educational and automated settings.
It is based on the taxonomy proposed by Taipalus et al. [^taipalus_errors2018], with revisions introduced in our work to address ambiguities and limitations encountered when applying the taxonomy in automated SQL error-detection pipelines. The revised taxonomy refines error definitions, clarifies labels, and improves suitability for algorithmic classification.
Definitions are grounded in observable properties of queries, with criteria chosen to remain suitable for automated detection while preserving pedagogical interpretability. No assumptions about student intent are made at this level.
Top-level categories
SQL errors can be organized into four categories:
-
Syntax errors: the query contains invalid SQL syntax and cannot be executed. No result set is produced.
- Example:
SELECT WHERE *; - This query is invalid since it's missing which columns to select, the entire
FROMclause, and it has an invalid condition in theWHEREclause.
- Example:
-
Semantic errors: the query produces a result set which is always useless, regardless of the data present in the database, and regardless of its data demand.
- Example:
SELECT * FROM table WHERE 1=0; - Regardless of the data demand, this query will always return an empty set, since the condition is always false.
- Example:
-
Logic errors: the query produces a valid set, but it does not satisfy its data demand.
- Example:
SELECT cID FROM customer WHERE city = 'Turin'; - Data demand: Select the IDs of customers who live in Genoa
- This query is valid, but it does not satisfy the data demand, since it lists the IDs of customers who live in Turin, instead of those who live in Genoa.
- Example:
-
Complications: the query satisfies its data demand, but in an overly complex way.
- Example:
SELECT city FROM store GROUP BY city; - Data demand: List all cities in which stores are located, without repetitions.
- Even though the query returns the correct result, using
GROUP BY(instead ofSELECT DISTINCT) just to remove duplicate values is both less efficient and harder to read.
- Example:
Taxonomy
You can use the links in the tables below to see detailed information on each error.
For each query, we will be referencing the following schema, adapted from Miedema et Al. [^miedema_identifying2022]. Underlined attributes collectively form the primary key for each table.
| Table name | Attributes |
|---|---|
| customer | cID, cName, street, city, age |
| store | sID, sName, street, city |
| product | pID, pName, suffix |
| inventory | sID, pID, date, quantity, unit_price |
Updates and maintenance
Each error type is assigned a unique identifier which is will never be changed or removed, but new IDs may be added in the future.
Existing taxonomies can be updated by running again the error categorization module on the same set of queries, and reclassifying any errors that have changed category or type.
Syntax errors
Definition A syntax error occurs when a SQL query violates the syntactic or typing rules of the SQL language and cannot be executed by the DBMS.
Key properties
- The DBMS rejects the query at parse time or during static validation.
- No result set is produced.
- The error can be detected without knowledge of the data demand.
- The error is independent of the database instance contents.
Pedagogical interpretation Syntax errors typically reflect difficulties with SQL grammar, clause structure, or expression formation, and often arise in early stages of learning.
Ambiguous database object
| ID | Name |
|---|---|
| 1 | Ambiguous column |
| 2 | Ambiguous function |
Undefined database object
| ID | Name |
|---|---|
| 3 | Undefined column |
| 4 | Undefined function |
| 5 | Undefined parameter |
| 6 | Undefined object |
| 7 | Invalid schema name |
| 8 | Misspellings |
| 9 | Synonyms |
| 10 | Omitting quotes around character data |
Data type mismatch
| ID | Name |
|---|---|
| 11 | Failure to specify column name twice |
| 12 | IS where not applicable |
| 13 | Data type mismatch |
Illegal aggregate function placement
| ID | Name |
|---|---|
| 14 | Using aggregate function outside SELECT or HAVING |
| 15 | Aggregate functions cannot be nested |
Illegal or insufficient grouping
| ID | Name |
|---|---|
| 16 | Extraneous or omitted grouping column |
| 17 | Strange HAVING: HAVING without GROUP BY |
Invalid subqueries
| ID | Name |
|---|---|
| 18 | Too many columns in subquery |
| 19 | Missing quantifier |
Common syntax error
Semantic errors
Definition A semantic error occurs when a SQL query is syntactically valid and executable, but its evaluation is semantically flawed regardless of the data demand, producing a result that is always meaningless.
Key properties
- The query executes successfully.
- The result set is intrinsically invalid (e.g., always empty or logically inconsistent).
- The error can be detected without reference to the intended task.
- The behavior holds for any possible database instance.
Pedagogical interpretation These errors often signal misconceptions about logical conditions, boolean reasoning, or the meaning of operators in SQL.
Inconsistent expression
| ID | Name |
|---|---|
| 38 | Implied, tautological or inconsistent expression |
| 39 | DISTINCT in SUM or AVG |
| 40 | DISTINCT that might remove important duplicates |
| 41 | Mixing a >0 with IS NOT NULL or empty string with NULL |
Inconsistent join
| ID | Name |
|---|---|
| 42 | NULL in IN/ANY/ALL subquery |
| 43 | Join condition on unmatchable column |
Duplicate rows
| ID | Name |
|---|---|
| 44 | Many duplicates |
Redundant column output
| ID | Name |
|---|---|
| 45 | Constant column output |
| 46 | Duplicate column output |
Logic errors
Definition A logic error occurs when a SQL query is syntactically and semantically valid, but does not satisfy the given data demand.
Key properties
- The query executes successfully.
- A result set is produced.
- The result does not match the expected outcome defined by the data demand.
- Detection requires comparison with at least one correct reference query or specification.
Pedagogical interpretation Logic errors reflect misunderstandings of the problem requirements, relational reasoning, or the mapping between natural language requests and SQL constructs.
Operator error
Join error
Nesting error
| ID | Name |
|---|---|
| 60 | Improper nesting of expressions |
| 61 | Improper nesting of subqueries |
Expression error
Projection error
Clause error
Function error
| ID | Name |
|---|---|
| 91 | Incorrect function |
| 92 | DISTINCT as function parameter where not applicable |
| 93 | Missing DISTINCT from function parameter |
| 94 | Incorrect column as function parameter |
Complications
Definition A complication occurs when a SQL query satisfies the data demand, but does so in an unnecessarily complex, redundant, or non-idiomatic way.
Key properties
- The query returns a correct result set.
- One or more components are redundant, superfluous, or replaceable by simpler constructs.
- Removing or simplifying these components does not change the result.
- Detection requires knowledge of the data demand.
Pedagogical interpretation Complications often indicate partial understanding or overgeneralization of SQL constructs, and provide opportunities for feedback focused on readability, efficiency, and idiomatic query formulation.
Complication
Notes
Classification principles
- The four top-level categories are mutually exclusive as primary labels, although a single query may exhibit multiple issues.
- Classification prioritizes observable query behavior over inferred intent.
Identifier design
- Error identifiers are symbolic rather than numeric to avoid implying a fixed ordering.
- This design choice increases robustness to future revisions, additions, or reclassification, without requiring systematic renumbering.
References
[^taipalus_errors2018]: Toni Taipalus, Mikko Siponen, and Tero Vartiainen. 2018. Errors and Complications in SQL Query Formulation. ACM Trans. Comput. Educ. 18, 3, Article 15 (September 2018), 29 pages. https://doi.org/10.1145/3231712
[^miedema_identifying2022]: Daphne Miedema, Efthimia Aivaloglou, and George Fletcher. Identifying SQL misconceptions of novices: Findings from a think-aloud study. ACM Inroads 13.1 (2022): 52-65. https://dx.doi.org/10.1145/3514214
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sql_error_taxonomy-1.1.0.tar.gz.
File metadata
- Download URL: sql_error_taxonomy-1.1.0.tar.gz
- Upload date:
- Size: 35.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10fdb66e037d5fbc1b8676a25b3c858fef94b67df45d7c3e19722d77bc770ec5
|
|
| MD5 |
a8c829b3a20ed55bfe880c6efde3ffdf
|
|
| BLAKE2b-256 |
0115a0a4eed869ce93d49048d0498adf76a6ffc916bac6f2036cdc02d1d3d380
|
File details
Details for the file sql_error_taxonomy-1.1.0-py3-none-any.whl.
File metadata
- Download URL: sql_error_taxonomy-1.1.0-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b45292b6a947f684b4e28b1d0ff82f231dbcb39e41c3a44fdd8482e69913dce7
|
|
| MD5 |
1e1eb1e8b6becef00b317cb2139691ed
|
|
| BLAKE2b-256 |
d345e12917acfb4c5455cf66767b75d8ccb3889c8be8c44a45c76926317f87ae
|