Skip to main content

This project analyses SQL statements and labels possible errors or complications.

Project description

SQL Error Taxonomy

This repository provides a SQL error taxonomy aimed at supporting uniform error classification in educational and automated settings.

It is based on the taxonomy proposed by Taipalus et al. [^taipalus_errors2018], with revisions introduced in our work to address ambiguities and limitations encountered when applying the taxonomy in automated SQL error-detection pipelines. The revised taxonomy refines error definitions, clarifies labels, and improves suitability for algorithmic classification.

Definitions are grounded in observable properties of queries, with criteria chosen to remain suitable for automated detection while preserving pedagogical interpretability. No assumptions about student intent are made at this level.

Top-level categories

SQL errors can be organized into four categories:

  1. Syntax errors: the query contains invalid SQL syntax and cannot be executed. No result set is produced.

    • Example: SELECT WHERE *;
    • This query is invalid since it's missing which columns to select, the entire FROM clause, and it has an invalid condition in the WHERE clause.
  2. Semantic errors: the query produces a result set which is always useless, regardless of the data present in the database, and regardless of its data demand.

    • Example: SELECT * FROM table WHERE 1=0;
    • Regardless of the data demand, this query will always return an empty set, since the condition is always false.
  3. Logic errors: the query produces a valid set, but it does not satisfy its data demand.

    • Example: SELECT cID FROM customer WHERE city = 'Turin';
    • Data demand: Select the IDs of customers who live in Genoa
    • This query is valid, but it does not satisfy the data demand, since it lists the IDs of customers who live in Turin, instead of those who live in Genoa.
  4. Complications: the query satisfies its data demand, but in an overly complex way.

    • Example: SELECT city FROM store GROUP BY city;
    • Data demand: List all cities in which stores are located, without repetitions.
    • Even though the query returns the correct result, using GROUP BY (instead of SELECT DISTINCT) just to remove duplicate values is both less efficient and harder to read.

Taxonomy

You can use the links in the tables below to see detailed information on each error.

For each query, we will be referencing the following schema, adapted from Miedema et Al. [^miedema_identifying2022]. Underlined attributes collectively form the primary key for each table.

Table name Attributes
customer cID, cName, street, city, age
store sID, sName, street, city
product pID, pName, suffix
inventory sID, pID, date, quantity, unit_price

Updates and maintenance

Each error type is assigned a unique identifier which is will never be changed or removed, but new IDs may be added in the future.

Existing taxonomies can be updated by running again the error categorization module on the same set of queries, and reclassifying any errors that have changed category or type.

Syntax errors

Definition A syntax error occurs when a SQL query violates the syntactic or typing rules of the SQL language and cannot be executed by the DBMS.

Key properties

  • The DBMS rejects the query at parse time or during static validation.
  • No result set is produced.
  • The error can be detected without knowledge of the data demand.
  • The error is independent of the database instance contents.

Pedagogical interpretation Syntax errors typically reflect difficulties with SQL grammar, clause structure, or expression formation, and often arise in early stages of learning.

Ambiguous database object

ID Name Discussed in
2 Ambiguous column [^taipalus_errors2018] [^ponzini_proposal2026]
3 Ambiguous function [^taipalus_errors2018]

Undefined database object

ID Name Discussed in
4 Undefined column [^taipalus_errors2018]
5 Undefined function [^taipalus_errors2018]
6 Undefined parameter [^taipalus_errors2018]
7 Undefined object [^taipalus_errors2018]
8 Invalid schema name [^taipalus_errors2018]
9 Misspellings [^taipalus_errors2018]
10 Synonyms [^taipalus_errors2018]
11 Omitting quotes around character data [^taipalus_errors2018]

Data type mismatch

ID Name Discussed in
12 Failure to specify column name twice [^taipalus_errors2018]
35 IS where not applicable [^taipalus_errors2018] [^ponzini_proposal2026]
13 Data type mismatch [^taipalus_errors2018]

Illegal aggregate function placement

ID Name Discussed in
14 Using aggregate function outside SELECT or HAVING [^taipalus_errors2018]
15 Aggregate functions cannot be nested [^taipalus_errors2018]

Illegal or insufficient grouping

ID Name Discussed in
16 Extraneous or omitted grouping column [^taipalus_errors2018]
17 HAVING without GROUP BY [^taipalus_errors2018]

Invalid subqueries

ID Name Discussed in
26 Too many columns in subquery [^taipalus_errors2018] [^ponzini_proposal2026]
106 Missing quantifier [^ponzini_proposal2026]

Common syntax error

ID Name Discussed in
18 Confusing function with function parameter [^taipalus_errors2018]
19 Using WHERE twice [^taipalus_errors2018]
20 Omitting the FROM clause [^taipalus_errors2018]
21 Comparison with NULL [^taipalus_errors2018]
22 Omitted semicolon [^taipalus_errors2018]
38 Additional semicolon [^taipalus_errors2018]
23 Date time field overflow [^taipalus_errors2018]
24 Duplicate clause [^taipalus_errors2018]
25 Undefined correlation name [^taipalus_errors2018]
27 Confused table names with column names [^taipalus_errors2018]
30 Confused the order of keywords (e.g., FROM customer SELECT fee) [^taipalus_errors2018]
32 Confused the syntax of keywords (e.g., LIKE ('A,' 'B')) [^taipalus_errors2018]
33 Omitting commas [^taipalus_errors2018]
107 Unmatched brackets [^ponzini_proposal2026]
108 Curly or square brackets [^ponzini_proposal2026]
36 Nonstandard keywords or standard keywords in wrong context [^taipalus_errors2018]
37 Nonstandard operators (e.g., &&, || or ==) [^taipalus_errors2018]
109 Different tuples in set operation [^ponzini_proposal2026]

Semantic errors

Definition A semantic error occurs when a SQL query is syntactically valid and executable, but its evaluation is semantically flawed regardless of the data demand, producing a result that is always meaningless.

Key properties

  • The query executes successfully.
  • The result set is intrinsically invalid (e.g., always empty or logically inconsistent).
  • The error can be detected without reference to the intended task.
  • The behavior holds for any possible database instance.

Pedagogical interpretation These errors often signal misconceptions about logical conditions, boolean reasoning, or the meaning of operators in SQL.

Inconsistent expression

ID Name Discussed in
40 Implied, tautological or inconsistent expression [^taipalus_errors2018]
41 DISTINCT in SUM or AVG [^taipalus_errors2018]
42 DISTINCT that might remove important duplicates [^taipalus_errors2018]
45 Mixed a >0 with IS NOT NULL or empty string with NULL [^taipalus_errors2018]

Inconsistent join

ID Name Discussed in
46 NULL in IN/ANY/ALL subquery [^taipalus_errors2018]
47 Join condition on unmatchable column [^taipalus_errors2018]

Duplicate rows

ID Name Discussed in
49 Many duplicates [^taipalus_errors2018]

Redundant column output

ID Name Discussed in
50 Constant column output [^taipalus_errors2018]
51 Duplicate column output [^taipalus_errors2018]

Logic errors

Definition A logic error occurs when a SQL query is syntactically and semantically valid, but does not satisfy the given data demand.

Key properties

  • The query executes successfully.
  • A result set is produced.
  • The result does not match the expected outcome defined by the data demand.
  • Detection requires comparison with at least one correct reference query or specification.

Pedagogical interpretation Logic errors reflect misunderstandings of the problem requirements, relational reasoning, or the mapping between natural language requests and SQL constructs.

Operator error

ID Name Discussed in
39 AND instead of OR [^taipalus_errors2018] [^ponzini_proposal2026]
52 OR instead of AND [^taipalus_errors2018]
53 Extraneous NOT operator [^taipalus_errors2018]
54 Missing NOT operator [^taipalus_errors2018]
55 Substituting existence negation with <> [^taipalus_errors2018]
57 Incorrect comparison operator or incorrect value compared [^taipalus_errors2018]

Join error

ID Name Discussed in
58 Incorrect table reference [^taipalus_errors2018]
62 Missing table reference [^taipalus_errors2018] [^ponzini_proposal2026]
59 Extraneous table reference [^taipalus_errors2018]
60 Join condition on incorrect column [^taipalus_errors2018]
61 Join condition with incorrect comparison operator [^taipalus_errors2018]
48 Missing join condition [^taipalus_errors2018] [^ponzini_proposal2026]
104 Condition on OUTER JOIN [^taipalus_errors2018] [^ponzini_proposal2026]

Nesting error

ID Name Discussed in
63 Improper nesting of expressions [^taipalus_errors2018]
64 Improper nesting of subqueries [^taipalus_errors2018]

Expression error

ID Name Discussed in
65 Extraneous quotes [^taipalus_errors2018]
66 Missing expression [^taipalus_errors2018]
68 Extraneous expression [^taipalus_errors2018]
67 Expression on incorrect column [^taipalus_errors2018]
69 Expression in incorrect clause [^taipalus_errors2018]
43 Wildcards without LIKE [^taipalus_errors2018] [^ponzini_proposal2026]
110 Wrong wildcard [^ponzini_proposal2026]
111 Invalid wildcard [^ponzini_proposal2026]

Projection error

ID Name Discussed in
70 Extraneous column in SELECT [^taipalus_errors2018]
71 Missing column from SELECT [^taipalus_errors2018]
72 Missing DISTINCT from SELECT [^taipalus_errors2018]
73 Missing AS from SELECT [^taipalus_errors2018]
74 Missing column from ORDER BY clause [^taipalus_errors2018]
75 Incorrect column in ORDER BY clause [^taipalus_errors2018]
77 Incorrect ordering of rows [^taipalus_errors2018]

Clause error

ID Name Discussed in
112 Missing WHERE clause [^ponzini_proposal2026]
113 Missing GROUP BY clause [^ponzini_proposal2026]
114 Missing HAVING clause [^ponzini_proposal2026]
115 Missing ORDER BY clause [^ponzini_proposal2026]
116 Missing LIMIT clause [^ponzini_proposal2026]
117 Missing OFFSET clause [^ponzini_proposal2026]
118 Extraneous WHERE clause [^ponzini_proposal2026]
119 Extraneous GROUP BY clause [^ponzini_proposal2026]
120 Extraneous HAVING clause [^ponzini_proposal2026]
76 Extraneous ORDER BY clause [^taipalus_errors2018] [^ponzini_proposal2026]
121 Extraneous LIMIT clause [^ponzini_proposal2026]
122 Extraneous OFFSET clause [^ponzini_proposal2026]
123 Incorrect LIMIT [^ponzini_proposal2026]
124 Incorrect OFFSET [^ponzini_proposal2026]

Function error

ID Name Discussed in
80 Incorrect function [^taipalus_errors2018]
78 DISTINCT as function parameter where not applicable [^taipalus_errors2018]
79 Missing DISTINCT from function parameter [^taipalus_errors2018]
81 Incorrect column as function parameter [^taipalus_errors2018]

Complications

Definition A complication occurs when a SQL query satisfies the data demand, but does so in an unnecessarily complex, redundant, or non-idiomatic way.

Key properties

  • The query returns a correct result set.
  • One or more components are redundant, superfluous, or replaceable by simpler constructs.
  • Removing or simplifying these components does not change the result.
  • Detection requires knowledge of the data demand.

Pedagogical interpretation Complications often indicate partial understanding or overgeneralization of SQL constructs, and provide opportunities for feedback focused on readability, efficiency, and idiomatic query formulation.

Complication

ID Name Discussed in
82 Unnecessary complication [^taipalus_errors2018]
83 Unnecessary DISTINCT in SELECT clause [^taipalus_errors2018]
84 Unnecessary table reference [^taipalus_errors2018]
85 Unused correlation name [^taipalus_errors2018]
86 Tables have the same data [^taipalus_errors2018] [^ponzini_proposal2026]
125 Correlation name identical to table name [^ponzini_proposal2026]
87 Unnecessarily general comparison operator [^taipalus_errors2018]
88 LIKE without wildcards [^taipalus_errors2018]
89 Unnecessarily complicated SELECT in EXISTS subquery [^taipalus_errors2018]
90 IN/EXISTS can be replaced by comparison [^taipalus_errors2018]
91 Unnecessary aggregate function [^taipalus_errors2018]
92 Unnecessary DISTINCT in aggregate function [^taipalus_errors2018]
93 Unnecessary argument of COUNT [^taipalus_errors2018]
94 Unnecessary GROUP BY in EXISTS subquery [^taipalus_errors2018]
95 GROUP BY with singleton groups [^taipalus_errors2018]
96 GROUP BY with only a single group [^taipalus_errors2018]
97 GROUP BY can be replaced with DISTINCT [^taipalus_errors2018]
98 UNION can be replaced by OR [^taipalus_errors2018]
99 Unnecessary column in ORDER BY clause [^taipalus_errors2018]
100 ORDER BY in subquery [^taipalus_errors2018]
101 Inefficient HAVING [^taipalus_errors2018]
102 Inefficient UNION [^taipalus_errors2018]
103 Condition in the subquery can be moved up [^taipalus_errors2018]
105 OUTER JOIN can be replaced by INNER JOIN [^taipalus_errors2018]
126 Unused CTE [^ponzini_proposal2026]

Deprecated error types

The following error types have been deprecated and should not be used for new classifications. They are retained in the taxonomy for backward compatibility and historical reference.

ID Name Discussed in Deprecated in
1 Omitting correlation names [^taipalus_errors2018] [^ponzini_proposal2026]
28 Restriction in SELECT clause [^taipalus_errors2018] [^ponzini_proposal2026]
29 Projection in WHERE clause [^taipalus_errors2018] [^ponzini_proposal2026]
31 Confusing the logic of keywords [^taipalus_errors2018] [^ponzini_proposal2026]
34 Curly, square or unmatched brackets [^taipalus_errors2018] [^ponzini_proposal2026]
44 Incorrect wildcard [^taipalus_errors2018] [^ponzini_proposal2026]
56 Putting NOT in front of incorrect IN/EXISTS [^taipalus_errors2018] [^ponzini_proposal2026]

Notes

Classification principles

  • The four top-level categories are mutually exclusive as primary labels, although a single query may exhibit multiple issues.
  • Classification prioritizes observable query behavior over inferred intent.

References

[^taipalus_errors2018]: Taipalus, Toni, Mikko Siponen, and Tero Vartiainen. "Errors and complications in SQL query formulation." ACM Transactions on Computing Education (TOCE) 18.3 (2018): 1-29. https://doi.org/10.1145/3231712

[^miedema_identifying2022]: Miedema, Daphne, Efthimia Aivaloglou, and George Fletcher. "Identifying SQL misconceptions of novices: Findings from a think-aloud study." ACM Inroads 13.1 (2022): 52-65. https://dx.doi.org/10.1145/3514214

[^ponzini_proposal2026]: Ponzini, Davide, Giovanna Guerrini, and Barbara Catania. "A Proposal for Revising SQL Error Taxonomies Based on Automated Detection." (2026). DataEd 2026. https://ceur-ws.org/Vol-4192/DataEd-paper9.pdf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql_error_taxonomy-2.0.0.tar.gz (36.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sql_error_taxonomy-2.0.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file sql_error_taxonomy-2.0.0.tar.gz.

File metadata

  • Download URL: sql_error_taxonomy-2.0.0.tar.gz
  • Upload date:
  • Size: 36.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for sql_error_taxonomy-2.0.0.tar.gz
Algorithm Hash digest
SHA256 415a94e38668072e60aff1707d1692d46a5edead985bd95f693642cd21b3c9bb
MD5 abf6db571df45a565794a8b30e0c6431
BLAKE2b-256 1af443129e850a8e3e2a7d9a85bdf27d3f6e7a5feacbd1cfac28e6e21e25d3b3

See more details on using hashes here.

File details

Details for the file sql_error_taxonomy-2.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sql_error_taxonomy-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ef0a1556b6eed814cb2aa5c508ffb9fc89fad242725f6257096fe250d9012b03
MD5 7ae64373031d945a799e72e68cbb46a6
BLAKE2b-256 1dfcc252d2fb52df7042f199869a977dc599776cdea387c05f053d16598916f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page