SQL — order of query execution

Important to know before optimizing query performance

Ihor Lukianov
5 min readSep 24, 2023

To maximize your query’s speed on any SQL engine, it’s essential to have an understanding of the SQL execution order. Even though you can work without this knowledge, I recommend reading this article to gain a quick understanding of it.

The SQL engine doesn’t follow the same order as you define in your query, so it’s crucial to remember that. For example, although we begin with a SELECT statement, the engine won’t start with this command. In this article, we’ll examine a complex query step by step to comprehend how the SQL engine operates behind the scenes.

Important: all examples are done on PostgreSQL, so the syntaxes might vary from engine to engine. Still, this concept is applicable to all other types of SQL engines.

Define the Query

For this example, I’d like to discuss a typical query used in real-world workflows. Suppose we have a database for cars with a table for different models, and each model has its own engine specifications listed in a separate table. To help illustrate this, we can create tables for this scenario.

DROP TABLE IF EXISTS cars, engines;
CREATE TABLE cars (
manufacturer VARCHAR(64),
model VARCHAR(64),
country VARCHAR(64),
engine_name VARCHAR(64),
year INT
);
CREATE TABLE engines (
name VARCHAR(64),
horse_power INT
);

INSERT INTO cars
VALUES
('BMW', 'M4', 'Germany', 'S58B30T0-353', 2021),
('BMW', 'M4', 'Germany', 'S58B30T0-375', 2021),
('Chevrolet', 'Corvette', 'USA', 'LT6', 2023),
('Chevrolet', 'Corvette', 'USA', 'LT2', 2023),
('Audi', 'R8', 'Germany', 'DOHC FSI V10-5.2-456', 2019),
('McLaren', 'GT', 'UK', 'M840TE', 2019),
('Mercedes', 'AMG C 63 S E', 'Germany', 'M139L', 2023);

INSERT INTO engines
VALUES
('S58B30T0-353', 473),
('S58B30T0-375', 510),
('LT6', 670),
('LT2', 495),
('DOHC FSI V10-5.2-456', 612),
('M840TE', 612),
('M139L', 469);

In order to achieve our objective of identifying the two most powerful cars from Germany, we’ll be looking at modern automobiles that are no older than eight years. To attain this, we will use well-known SQL statements such as SELECT, FROM, JOIN, WHERE, GROUP BY, HAVING, ORDER BY, and LIMIT.

SELECT
cars.manufacturer
, cars.model
, cars.country
, cars.year
, MAX(engines.horse_power) as maximum_horse_power
FROM cars
JOIN engines
ON cars.engine_name = engines.name
WHERE cars.year > 2015 AND cars.country = 'Germany'
GROUP BY cars.manufacturer, cars.model, cars.country, cars.year
HAVING MAX(engines.horse_power)> 200
ORDER BY maximum_horse_power DESC
LIMIT 2

The output of this query is the following:

Query output — two most powerful German cars from our sample database

Now when we have our query, let’s understand how the engine orders them when executing. Here is the order:

  1. FROM
  2. JOIN (and ON)
  3. WHERE
  4. GROUP BY
  5. HAVING
  6. SELECT
  7. ORDER BY
  8. LIMIT
Source: https://blog.bytebytego.com/p/ep50-visualizing-a-sql-query

It’s important to note that before executing the query, the SQL engine creates an execution plan to reduce resource consumption. This plan offers valuable details such as estimated costs, joining algorithms, order of operations, and more. This is a comprehensive output, and it can be accessed if needed.

Step-by-step

FROM and JOIN

FROM cars
JOIN engines

When starting a SQL query, the engine requires knowledge of which tables to use. This is accomplished by beginning with a FROM statement. You can add more tables using the JOIN keyword, as long as they share a common column that will be used in the query. It’s a straightforward process that you should keep in mind.

ON

ON cars.engine_name = engines.name

Next in the sequence comes ON, where we define how to join different tables. This process also involves using pre-defined indices, such as B-tree and Bitmap, to speed up calculations. It’s important to note that there are several types of indices that can help in this case. These two stages require a considerable amount of processing, so it’s crucial to focus and begin optimizing at this point.

WHERE

WHERE cars.year > 2015 AND cars.country = 'Germany'

When sorting through our data, it’s important to keep in mind that using the WHERE clause only with indexed columns can enhance performance, especially when dealing with large tables. Additionally, it may be beneficial to filter data in subqueries or CTEs before the WHERE statement in some scenarios to further boost performance.

However, it’s important to note that many issues regarding query performance are beyond the scope of this article. I recommend delving deeper into such problems and experimenting with various techniques to write faster queries.

GROUP BY and HAVING

GROUP BY cars.manufacturer, cars.model, cars.country, cars.year
HAVING MAX(engines.horse_power)> 200

Next in the sequence, we need to follow the query’s specified order accordingly. After that, we have to determine all the necessary aggregations that we have to carry out. When it comes to the HAVING clause, it’s intriguing because we cannot employ an alias from the SELECT line. This is because the SQL engine is still unaware of this definition.

SELECT

SELECT
cars.manufacturer
, cars.model
, cars.country
, cars.year
, MAX(engines.horse_power) as maximum_horse_power

Once all the necessary steps were completed, we proceeded to execute the SELECT statement. At this point, we simply specified the columns to include in the final output. It’s important to keep in mind that many operations, such as merging and aggregating, have already been completed by this stage.

ORDER BY and LIMIT

ORDER BY maximum_horse_power DESC
LIMIT 2

Once we execute the final commands, we become aware of the aliases that we mentioned in the SELECT statement. As a result, we can utilize the alias maximum_horse_power instead of the function name, although we can still use the latter. It’s best to steer clear of sorting a large amount of data output as it can take up a significant amount of time.

Conclusions

To truly understand SQL and optimize the query process, it’s important to note that the SQL engine doesn’t work in the same order as the query. This can help you optimize merging and filtering data in the earlier steps. Subsequent articles will cover indices and measuring query performance, which are crucial for effective optimization.

I can be found on Linkedin and I am looking forward to connecting with you. Let’s engage in discussions about the intricate world of data science and data engineering.

--

--

Ihor Lukianov

Data Engineer @Namecheap. Interest in Data Engineering and NLP/NLU problems.