Could not complete cursor operation because the table schema changed after the cursor was declared.

Msg 16943, Level 16, State 4, Line 16
Could not complete cursor operation because the table schema changed after the cursor was declared.

Adding or removing non clustered indexes seems like a low risk performance tuning operation, right? Well, under normal circumstances it is true as you are not modifying the underlying data nor changing the table schema. I found out the hard way today that it could cause rather nasty errors like the one above if it is performed while there is a cursor loop running in another session.

 

I will show you in a quick demo. Let’s start by creating a table and inserting some data.

 

 

create table temp
(
    id int not null identity(1, 1) primary key
    , data varchar(10) not null default('test')
)
go

insert temp default values
go 50

We will then kick off a simple cursor looping through the table.

declare cur cursor
for
    select id from temp

open cur

declare @id int

fetch cur into @id

while @@fetch_status = 0
begin
    print @id
    waitfor delay '00:00:01'
    
    fetch cur into @id
end

close cur
deallocate cur

 

 

While the session is running, we will pretend that we have no knowledge of the running cursor code and we’ve identified a performance tuning opportunity to add a non clustered index to the table. We will do just that but in a separate session window.

 

create index idx_temp on temp (data)

 

 

Once the index is created, we then switch back to the session window with the cursor and should expect to see the error in the Messages tab.

 

1
2
3
4
5
Msg 16943, Level 16, State 4, Line 16
Could not complete cursor operation because the table schema changed after the cursor was declared.

So there you have it, I’ve just demonstrated that no matter how seemingly harmless an operation is, there is always a potential risk for unintended side effects.

Using computed columns to solve performance problems

Some times when you are working with 3rd party vendor code, or maintaining legacy code, it can be tricky especially when a query doesn’t perform. Modifying the query by adding hints is one way to address the issue but more than often you will find your hands are tied and unable to change a single line of code. I had to deal with something like that once and fortunate enough to have computed columns at my disposal.

A computed column although easy to understand is probably not the first tool people associate with performance tuning. Yet it can be a very powerful weapon in your performance tuning arsenal. I will demonstrate with an example.

Say you are working with a table and a view like the ones below.

use tempdb go

if object_id('trade_id_lookup') is not null drop table trade_id_lookup go

create table trade_id_lookup ( trade_id bigint not null identity(1, 1) primary key , system_id int not null , portfolio sysname not null , product_id int not null ) go if object_id('v_trade_id_lookup') is not null drop view v_trade_id_lookup go

create view v_trade_id_lookup as select cast(trade_id as nvarchar) as 'trade_id' , system_id , portfolio , product_id from trade_id_lookup go

insert trade_id_lookup (system_id, portfolio, product_id) values (1, 'portfolio', 9) go 10000

The view is then used throughout the system for looking up trade attributes given a trade_id.

select system_id, portfolio, product_id
from dbo.v_trade_id_lookup
where trade_id = '999'

It all looks quite innocent until you start to interrogate the execution plan.image

A clustered index scan, why? Due to the CAST expression embedded in the view, the optimizer is not able to perform an index seek because it had to convert every single row. If you hover over the index scan icon, it will show something like:

(CONVERT(nvarchar(30),[tempdb].[dbo].[trade_id_lookup].[trade_id],0)=CONVERT_IMPLICIT(nvarchar(4000),[@1],0))

Bear in mind that I intentionally stripped away all the non essential code for the demo. Imagine looking at a large execution plan filled with joins and index scans, it is not an easy task to spot something like this without looking at the code inside the view. This is one of the reasons I hate deeply nested views, they are a product of object-oriented paradigm and should not belong in the database world. Or maybe it’s because of my inability to understand INCEPTION.

Now we know what’s causing the performance issue, what can we do about that? One quick answer would be to remove the CAST expression inside the view. I wish it was that simple, remember we are working with code that we don’t necessarily understand its intent. Modifying queries albeit a seemingly simple line is fraught with danger and should be your last sort.

So we cannot change the queries nor the view. Then what? Fortunately, I could modify the underlying table, provided that no business logic was modified. This is where the computed column comes in handy.

alter table trade_id_lookup
add trade_id_string as cast(trade_id as nvarchar)
go 
create index idx_louie on trade_id_lookup (trade_id_string)
go

By copying the CAST logic from the view into a computed column, we achieved two things:

  1. An index can now be created on the converted trade_id.
  2. No code has to be modified and it all just magically works.

Executing the SELECT query again will yield the execution plan below, see how the index seek is now in use?

image

The key lookup is a by-product of the demo, you can optimize the query further by adding the referenced columns in the INCLUDE list of the index.

–oo00oo–

To understand a bit more about how the optimizer utilises indexes on computed columns, please refer to Paul White’s answer to the question Index on Persisted Computed column needs key lookup to get columns in the computed expression

How a wrong join made the query MUCH slower

I was monitoring a bunch of batch jobs one day and noticed one job was taking much longer time than usual. Normally the job completes around 3 minutes but this time it had been running for 3 hours and still going.

After running a series of troubleshooting commands like “sp_who2 active” and “dbcc inputbuffer()”, I relied on the query below to identify the runaway query.

select 
    s.session_id  as 'session_id'
    , s.status                  as 'session_status'
    , r.status                  as 'request_status'
    , s.last_request_start_time as 'last_request_start_time'
    , db_name(r.database_id)    as 'database_name'
    , t.text                    as 'command'
    , r.wait_type               as 'wait_type'
    , r.blocking_session_id     as 'blocking_session_id'
    , s.login_name              as 'login_name'
    , s.host_name               as 'host_name'
    , s.program_name            as 'program_name'
    , p.query_plan              as 'query_plan'
from sys.dm_exec_sessions s
left join sys.dm_exec_requests r     on s.session_id = r.session_id
outer apply sys.dm_exec_sql_text(r.sql_handle) t
outer apply sys.dm_exec_query_plan(r.plan_handle) p
where s.is_user_process = 1
and s.status = 'running'
and s.session_id != @@spid

Both blocking_session_id and wait_type were null, the only useful information that could be extracted from the query was the execution plan:

query_plan_nested_loop

Before I go on and explain the execution plan, I will provide some background information. The table was joined against itself and it had 3.8 million rows. Worst of all, the table didn’t have any indexes on it at all, not even a clustered index. Blessed with such insight, it was not difficult to see that the execution plan was sub-optimal. The infamous table scans were due to the lack of indexes, unforgivable nonetheless they were not the main offender. The nested loops were really the culprit killing the query. How so? For every row out of 3.8 million, it was scanning through 3.8 million rows, it could be illustrated by the formula below:

3,800,000 x 3,800,000 = 14,440,000,000,000

Adding an index to the table (and fixing statistics at the same time) was the most sensible approach for improving the performance on this query. However, the aim of this post is to demonstrate the effect of a different join operator on the query.

I was able to locate a historical execution plan which was generated by the query optimizer. The hash join in the execution plan made it obvious why the query never had any problems in the past.

query_plan_hash_join

This was the same time as saying:

3,800,000 + 3,800,000 = 7,600,000

What an astronomical difference! Once the root cause was identified, it was quite easy to work out what to do about it. A solution could be:

  • Updating the statistics on the table;
  • Implementing a join hint;
  • Or applying the most effective performance tuning technique: adding an index.

An index was added in the end and the job was completed in no time.

If you would like to get a deeper understanding of the join operators or query execution in general, I highly recommend Craig Freeman’s blog. Below is what he wrote about hash join:

When it comes to physical join operators, hash join does the heavy lifting. While nested loops join works well with relatively small data sets and merge join helps with moderately sized data sets, hash join excels at performing the largest joins. Hash joins parallelize and scale better than any other join and are great at maximizing throughput in data warehouses.

Not dropping temp tables is better for performance

I have wondered about this one for some time now but have never got to the bottom of it. Whether to drop temp tables at the end of stored procedures or not, I never thought it really mattered to performance given the beefy machines we get to command these days.

I have asked a few fellow DBAs and the response was usually along the line of “yeah you should drop temp tables, because it is a good practice.” That’s just not good enough for me.

After reading Paul White’s blog post one day, an idea came to mind. I could use sys.fn_dblog to see if dropping temp tables made any differences. I then tried out a simple test and reached a conclusion for my peace of mind.

I will show you the test I have done and the findings. The code is pasted below if you would like to try it out yourself.

-- Create two procs, one with DROP TABLE and one without. use tempdb go create proc dbo.DropTempTable as create table #demo (i int) drop table #demo go create proc dbo.NoDropTempTable as create table #demo (i int) go

Once the procs are created, you can then run the example.

checkpoint -- in order to reset the fn_dblog output
exec dbo.DropTempTable -- Run the first proc
select
    operation,
    [transaction name],
    [transaction id]
from sys.fn_dblog(null, null)
go

checkpoint
exec dbo.NoDropTempTable -- Run the second proc
select
    operation,
    [transaction name],
    [transaction id]
from sys.fn_dblog(null, null)
go

The example above should be self-explanatory. What I want to show you is the output from fn_dblog:

image

As you can see, the transaction log entries were identical except the extra transaction required for dropping the temp table in the first proc. Looking at the result, it is reasonable to conclude that NOT dropping temp table is better for performance because SQL Server has less work to do.

To me that settled it, why wouldn’t I write less code if it is better for performance?

–oo00oo–
For more info, please refer to the following links:
Explicitly dropping Temporary Tables and performance – Myth or Fact?