SQL — Cursors and Their Usage in Databases

An introduction to working with the cursor in PostgreSQL and MS SQL Server

Ihor Lukianov
Stackademic

--

When it comes to working with structured data, SQL is still the most important language to use. Although we have multiple options to perform data processing, like using Python as the main language and its useful frameworks, many operations can be performed easily on the database side.

However, SQL is a declarative language and has a few weaknesses, such as working with iterations over a table or part of the data. Today, I’m introducing you to a great tool inside any SQL dialect — the cursor. To put it simply, the cursor allows us to consume the results while the database engine is generating them, thus opening up many possible solutions. So, let’s start with the basics.

I use the dataset from the NASDAQ stock screener, which was also used in the previous article on SQL indexes. You can also directly access the CSV file with the dataset from my GitHub. The same dataset is used for both PostgreSQL and MS SQL Server examples.

Why do we need cursors in SQL?

Although cursors are not commonly used in SQL statements, they are still a crucial concept to understand. When working with a database from Python, you will undoubtedly use a cursor to execute your statements. Therefore, it is important to comprehend how it works on the SQL side. You may have used the following code structure previously to access the database.

import psycopg2

conn = psycopg2.connect()
cur = conn.cursor()

cur.execute("SELECT * FROM test_table;")
res = cur.fetchall()

cur.close()
conn.close()

In certain cases, you may want to process rows individually or in batches, particularly if you are dealing with large amounts of data and memory capacity is a concern. You can fetch a small number of rows at a time to avoid any potential memory issues. In Postgres, for-loops automatically use cursors during processing.

Additionally, you can return a cursor reference from the function. This provides an efficient way to output a large number of rows from the function, allowing the caller to use them as needed.

Basic cursor operations

Working with a cursor is a simple process that involves four main operations. These logical steps provide you with the functionality to work with data in a loop-like structure. The four operations are cursor declaration (DECLARE), opening the newly created cursor (OPEN), fetching data from the cursor (FETCH), and finally closing it once you no longer need it (CLOSE). Let me explain each of these operations in detail.

Cursor operations in one diagram. Source.

Declaring cursor

To begin with, we need to declare a new cursor. This can be done by specifying the name and processing the query. For instance, consider the following query where we want to select the top 5 companies with the highest volume of stocks that entered the market in 2021 (ipoyear). The only difference between the two syntaxes is that in the SQL Server case, we use TOP 5 instead of LIMIT 5.

-- PostgreSQL
DECLARE
-- Declare cursor
stock_cursor CURSOR FOR
SELECT symbol
FROM public.nasdaq_stocks
WHERE ipoyear = 2021
ORDER BY volume
LIMIT 5;

-- MS SQL Server
DECLARE
stock_cursor CURSOR FOR
SELECT TOP 5 symbol
FROM dbo.nasdaq_data
WHERE ipoyear = 2021
ORDER BY volume;

It’s worth mentioning that there are some options available for additional functionality. One particularly popular option is SCROLL, which allows you to move back through the cursor data. You can easily include this keyword in your declaration to enable this functionality.

-- for both cases
DECLARE
stock_cursor SCROLL CURSOR FOR
SELECT symbol
FROM nasdaq_stocks;

I believe that in most cases, a bounded cursor is used which involves a query in its declaration. In both cases, we use the FOR keyword. However, there is also an option to create an unbounded cursor. I could not find a practical use case for this type since it still involves some form of query, so we will not discuss it further.

Opening

Once we declare our cursor, we need to open it to start working with it. This can be done using the OPEN keyword. It’s important to note that when we open the cursor, a new data structure is created on the database side, which allows us to access data from the cursor. However, in the case of Postgres, this step is automatically done for us, so there’s no need to perform it as a separate step.

OPEN stock_cursor;

Fetching data

We have reached the most important step of our topic which is fetching the data. This is where the cursor comes in handy as it allows us to fetch data in batches or by a single row, and perform different logic based on a certain condition. As with the previous operation, we can use the FETCH keyword to retrieve the desired amount of data. Let’s begin with a basic example and get the first rows from the cursor.

-- PostgreSQL
DECLARE
-- Declare cursor
stock_cursor CURSOR FOR
SELECT symbol, volume
FROM public.nasdaq_stocks
ORDER BY volume DESC;

FETCH NEXT FROM stock_cursor;

-- SQL SERVER
DECLARE
-- Declare cursor
stock_cursor CURSOR FOR
SELECT symbol, volume
FROM dbo.nasdaq_data
ORDER BY volume DESC;

OPEN stock_cursor; -- the only difference - open the cursor
FETCH NEXT FROM stock_cursor;
-- MCOM, 345306502

It’s not useful to get only one row from the results, so we can use the LIMIT statement to retrieve more. Now, let’s perform fetching in a loop, which is possible for both SQL syntaxes. However, there are some differences between them. For example, in PostgreSQL, we use a variable to process each row in each iteration. In this example, I simply notify that the row has been processed.

DO $$ 
DECLARE
-- Declare variables
row_data public.nasdaq_stocks%ROWTYPE;
-- Declare cursor
symbol_cursor CURSOR FOR
SELECT symbol, volume
FROM public.nasdaq_stocks
WHERE ipoyear = 2021
ORDER BY volume DESC;
BEGIN
-- Loop to fetch rows
FOR row_data IN symbol_cursor
LOOP
-- Process the row
RAISE NOTICE 'Processing row: %', row_data;

-- Add your processing logic here
END LOOP;
CLOSE symbol_cursor;

END $$;
Rows processed one by one.

Let’s now explain the same logic for SQL Server. One of the great features of T-SQL is the @@FETCH_STATUS variable, which stops the iteration once all the data from the cursor has been processed. Additionally, we use variables to obtain data from each iteration.

GO
-- declaring variables and cursor
DECLARE @Symbol VARCHAR(50), @Volume INT;
DECLARE
stock_cursor CURSOR FOR
SELECT symbol, volume
FROM dbo.nasdaq_data
WHERE ipoyear = 2021
ORDER BY volume DESC;

OPEN stock_cursor; -- don't forget to open for SQL Server

-- Fetch first row before iterating
FETCH NEXT FROM stock_cursor
INTO @Symbol, @Volume;

WHILE @@FETCH_STATUS = 0
BEGIN
-- print to see the output
PRINT @Symbol + ' - ' + CAST(@Volume AS VARCHAR);

-- fetch the following row
FETCH NEXT FROM stock_cursor
INTO @Symbol, @Volume;
END

-- close the cursor in the end
CLOSE stock_cursor;
DEALLOCATE stock_cursor;
GO
Like in a previous example, all the data was processed separately.

You can find various useful features for cursors in T-SQL, and I recommend exploring the documentation. Different types of cursors might benefit your use cases.

Closing

As a final step, you need to close your cursor and deallocate it from memory for SQL Server. Add the CLOSE statement at the end of your transaction.

-- PostgreSQL
CLOSE stock_cursor;

-- MS SQL Server
CLOSE stock_cursor;
DEALLOCATE stock_cursor;

Possible alternatives

Although it may be a useful approach, implementing and working with cursors can be more complex in general. Therefore, I suggest exploring simpler and more common options before resorting to using cursors. Some other options to consider are:

  • Common Table Expressions (CTEs) — these allow you to write complex queries using a usual syntax. You may be able to achieve your desired functionality without the need for iterations.
  • Window functions — these provide easy ways to categorize or aggregate your rows, particularly on a micro level.
  • Recursive queries — while not necessarily the easiest option, these could be helpful for cases involving hierarchical data.

Conclusions

Hopefully, you now have a basic understanding of the cursor and its potential use cases. However, the functionality of the cursor is vast and cannot be fully covered in one article. By understanding these examples, you should be able to delve deeper into the documentation for your specific database system. Keep in mind that if you have a complex workflow that requires precise control on a row-by-row basis, you should consider using a cursor.

I can be found on LinkedIn and I am looking forward to connecting with you. Let’s engage in discussions about the intricate world of data science and data engineering.

Stackademic

Thank you for reading until the end. Before you go:

--

--

Data Engineer @Namecheap. Interest in Data Engineering and NLP/NLU problems.