Skip to main content

This project analyses SQL statements and labels possible errors or complications.

Project description

SQL Error Taxonomy

This repository provides a SQL error taxonomy aimed at supporting uniform error classification in educational and automated settings.

It is based on the taxonomy proposed by Taipalus et al. [^taipalus_errors2018], with revisions introduced in our work to address ambiguities and limitations encountered when applying the taxonomy in automated SQL error-detection pipelines. The revised taxonomy refines error definitions, clarifies labels, and improves suitability for algorithmic classification.

Definitions are grounded in observable properties of queries, with criteria chosen to remain suitable for automated detection while preserving pedagogical interpretability. No assumptions about student intent are made at this level.

Top-level categories

SQL errors can be organized into four categories:

  1. Syntax errors: the query contains invalid SQL syntax and cannot be executed. No result set is produced.

    • Example: SELECT WHERE *;
    • This query is invalid since it's missing which columns to select, the entire FROM clause, and it has an invalid condition in the WHERE clause.
  2. Semantic errors: the query produces a result set which is always useless, regardless of the data present in the database, and regardless of its data demand.

    • Example: SELECT * FROM table WHERE 1=0;
    • Regardless of the data demand, this query will always return an empty set, since the condition is always false.
  3. Logic errors: the query produces a valid set, but it does not satisfy its data demand.

    • Example: SELECT cID FROM customer WHERE city = 'Turin';
    • Data demand: Select the IDs of customers who live in Genoa
    • This query is valid, but it does not satisfy the data demand, since it lists the IDs of customers who live in Turin, instead of those who live in Genoa.
  4. Complications: the query satisfies its data demand, but in an overly complex way.

    • Example: SELECT city FROM store GROUP BY city;
    • Data demand: List all cities in which stores are located, without repetitions.
    • Even though the query returns the correct result, using GROUP BY (instead of SELECT DISTINCT) just to remove duplicate values is both less efficient and harder to read.

Taxonomy

You can use the links in the tables below to see detailed information on each error.

For each query, we will be referencing the following schema, adapted from Miedema et Al. [^miedema_identifying2022]. Underlined attributes collectively form the primary key for each table.

Table name Attributes
customer cID, cName, street, city, age
store sID, sName, street, city
product pID, pName, suffix
inventory sID, pID, date, quantity, unit_price

Updates and maintenance

Each error type is assigned a unique identifier which is will never be changed or removed, but new IDs may be added in the future.

Existing taxonomies can be updated by running again the error categorization module on the same set of queries, and reclassifying any errors that have changed category or type.

Syntax errors

Definition A syntax error occurs when a SQL query violates the syntactic or typing rules of the SQL language and cannot be executed by the DBMS.

Key properties

  • The DBMS rejects the query at parse time or during static validation.
  • No result set is produced.
  • The error can be detected without knowledge of the data demand.
  • The error is independent of the database instance contents.

Pedagogical interpretation Syntax errors typically reflect difficulties with SQL grammar, clause structure, or expression formation, and often arise in early stages of learning.

Ambiguous database object

ID Name
1 Ambiguous column
2 Ambiguous function

Undefined database object

ID Name
3 Undefined column
4 Undefined function
5 Undefined parameter
6 Undefined object
7 Invalid schema name
8 Misspellings
9 Synonyms
10 Omitting quotes around character data

Data type mismatch

ID Name
11 Failure to specify column name twice
12 IS where not applicable
13 Data type mismatch

Illegal aggregate function placement

ID Name
14 Using aggregate function outside SELECT or HAVING
15 Aggregate functions cannot be nested

Illegal or insufficient grouping

ID Name
16 Extraneous or omitted grouping column
17 Strange HAVING: HAVING without GROUP BY

Invalid subqueries

ID Name
18 Too many columns in subquery
19 Missing quantifier

Common syntax error

ID Name
20 Confusing function with function parameter
21 Using WHERE twice
22 Omitting the FROM clause
23 Comparison with NULL
24 Omitting the semicolon
25 Date time field overflow
26 Duplicate clause
27 Using an undefined correlation name
28 Confusing table names with column names
29 Confusing the order of keywords (e.g., FROM customer SELECT fee)
30 Confusing the syntax of keywords (e.g., LIKE ('A,' 'B'))
31 Omitting commas
32 Unmatched brackets
33 Curly or square brackets
34 Nonstandard keywords or standard keywords in wrong context
35 Nonstandard operators (e.g., &&, || or ==)
36 Additional semicolon
37 Different tuples in set operation

Semantic errors

Definition A semantic error occurs when a SQL query is syntactically valid and executable, but its evaluation is semantically flawed regardless of the data demand, producing a result that is always meaningless.

Key properties

  • The query executes successfully.
  • The result set is intrinsically invalid (e.g., always empty or logically inconsistent).
  • The error can be detected without reference to the intended task.
  • The behavior holds for any possible database instance.

Pedagogical interpretation These errors often signal misconceptions about logical conditions, boolean reasoning, or the meaning of operators in SQL.

Inconsistent expression

ID Name
38 Implied, tautological or inconsistent expression
39 DISTINCT in SUM or AVG
40 DISTINCT that might remove important duplicates
41 Mixing a >0 with IS NOT NULL or empty string with NULL

Inconsistent join

ID Name
42 NULL in IN/ANY/ALL subquery
43 Join condition on unmatchable column

Duplicate rows

ID Name
44 Many duplicates

Redundant column output

ID Name
45 Constant column output
46 Duplicate column output

Logic errors

Definition A logic error occurs when a SQL query is syntactically and semantically valid, but does not satisfy the given data demand.

Key properties

  • The query executes successfully.
  • A result set is produced.
  • The result does not match the expected outcome defined by the data demand.
  • Detection requires comparison with at least one correct reference query or specification.

Pedagogical interpretation Logic errors reflect misunderstandings of the problem requirements, relational reasoning, or the mapping between natural language requests and SQL constructs.

Operator error

ID Name
47 AND instead of OR
48 OR instead of AND
49 Extraneous NOT operator
50 Missing NOT operator
51 Substituting existence negation with <>
52 Incorrect comparison operator or incorrect value compared

Join error

ID Name
53 Incorrect table reference
54 Missing table reference
55 Extraneous table reference
56 Join condition on incorrect column
57 Join condition with incorrect comparison operator
58 Omitting a join condition
59 Condition on OUTER JOIN

Nesting error

ID Name
60 Improper nesting of expressions
61 Improper nesting of subqueries

Expression error

ID Name
62 Extraneous quotes
63 Missing expression
64 Extraneous expression
65 Expression on incorrect column
66 Expression in incorrect clause
67 Wildcards without LIKE
68 Wrong wildcard
69 Invalid wildcard

Projection error

ID Name
70 Extraneous column in SELECT
71 Missing column from SELECT
72 Missing DISTINCT from SELECT
73 Missing AS from SELECT
74 Missing column from ORDER BY clause
75 Incorrect column in ORDER BY clause
76 Incorrect ordering of rows

Clause error

ID Name
77 Missing WHERE clause
78 Missing GROUP BY clause
79 Missing HAVING clause
80 Missing ORDER BY clause
81 Missing LIMIT clause
82 Missing OFFSET clause
83 Extraneous WHERE clause
84 Extraneous GROUP BY clause
85 Extraneous HAVING clause
86 Extraneous ORDER BY clause
87 Extraneous LIMIT clause
88 Extraneous OFFSET clause
89 Incorrect LIMIT
90 Incorrect OFFSET

Function error

ID Name
91 Incorrect function
92 DISTINCT as function parameter where not applicable
93 Missing DISTINCT from function parameter
94 Incorrect column as function parameter

Complications

Definition A complication occurs when a SQL query satisfies the data demand, but does so in an unnecessarily complex, redundant, or non-idiomatic way.

Key properties

  • The query returns a correct result set.
  • One or more components are redundant, superfluous, or replaceable by simpler constructs.
  • Removing or simplifying these components does not change the result.
  • Detection requires knowledge of the data demand.

Pedagogical interpretation Complications often indicate partial understanding or overgeneralization of SQL constructs, and provide opportunities for feedback focused on readability, efficiency, and idiomatic query formulation.

Complication

ID Name
95 Unnecessary complication
96 Unnecessary DISTINCT in SELECT clause
97 Unnecessary table reference
98 Unused correlation name
99 Tables have the same data
100 Correlation name identical to table name
101 Unnecessarily general comparison operator
102 LIKE without wildcards
103 Unnecessarily complicated SELECT in EXISTS subquery
104 IN/EXISTS can be replaced by comparison
105 Unnecessary aggregate function
106 Unnecessary DISTINCT in aggregate function
107 Unnecessary argument of COUNT
108 Unnecessary GROUP BY in EXISTS subquery
109 GROUP BY with singleton groups
110 GROUP BY with only a single group
111 GROUP BY can be replaced with DISTINCT
112 UNION can be replaced by OR
113 Unnecessary column in ORDER BY clause
114 ORDER BY in subquery
115 Inefficient HAVING
116 Inefficient UNION
117 Condition in the subquery can be moved up
118 OUTER JOIN can be replaced by INNER JOIN
119 Unused CTE

Notes

Classification principles

  • The four top-level categories are mutually exclusive as primary labels, although a single query may exhibit multiple issues.
  • Classification prioritizes observable query behavior over inferred intent.

Identifier design

  • Error identifiers are symbolic rather than numeric to avoid implying a fixed ordering.
  • This design choice increases robustness to future revisions, additions, or reclassification, without requiring systematic renumbering.

References

[^taipalus_errors2018]: Toni Taipalus, Mikko Siponen, and Tero Vartiainen. 2018. Errors and Complications in SQL Query Formulation. ACM Trans. Comput. Educ. 18, 3, Article 15 (September 2018), 29 pages. https://doi.org/10.1145/3231712

[^miedema_identifying2022]: Daphne Miedema, Efthimia Aivaloglou, and George Fletcher. Identifying SQL misconceptions of novices: Findings from a think-aloud study. ACM Inroads 13.1 (2022): 52-65. https://dx.doi.org/10.1145/3514214

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql_error_taxonomy-1.1.2.tar.gz (35.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sql_error_taxonomy-1.1.2-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file sql_error_taxonomy-1.1.2.tar.gz.

File metadata

  • Download URL: sql_error_taxonomy-1.1.2.tar.gz
  • Upload date:
  • Size: 35.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for sql_error_taxonomy-1.1.2.tar.gz
Algorithm Hash digest
SHA256 b587e15e3906d0554e4fbff4e10962e928d4c1f2f5811bb50ed1025816c12297
MD5 50a3a3466682babe2453bc196da98683
BLAKE2b-256 e519ea68138361fe24afd44ec6ba464320f2ba05c6bd7899b4e0b479342d577c

See more details on using hashes here.

File details

Details for the file sql_error_taxonomy-1.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for sql_error_taxonomy-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 01ed3ad913ff28055bda9cf918e07f73bdabfc63fc0bde850b08e0059e9adf20
MD5 aa55d5dc6333a5a0b030efd168ffce1f
BLAKE2b-256 d9906092b3f7b3203c04e475e86c78631b36ba01f87b1a4f20ccc339475fc3d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page