Database Blocking -- Why?

Erland Sommarskog · Jan 10, 2009

Jonathan said:
This code was working just fine all day. But, suddenly, the database query
in the catch block was timing out, causing the second, outter, catch
statement to run.

If I set CommandTimeout to 0, it blocks indefinitely. While it was
blocked, I ran sp_who2 and the results can be seen at
http://www.fileparade.com/DBBlock.jpg.

It shows the UPDATE is blocked by the SELECT. But why does a SELECT block?

It's funny, there is a long thread, but no one has answered one seemingly
puzzling question, and no one has mentioned the solution which might be
the easiest to implement.

Why did this work before, and then stopped working one day? The answer
to this is that although you read one row at a time in your code, the
client API will get a bunch of rows at a time from SQL Server. So as long
as the SqlClient could get all rows in the result set in one go, you were
fine, because by the time you performed the UPDATE the locks from the
SELECT were released.

But the day when the result set grow in size, or something else changed
that caused ADO .Net to get only some of the rows, and now the UPDATE was
blocked, because SQL Server still held the locks, as the SELECT was
still running.

You may ask how to control the buffering in ADO .Net, but I don't think
SqlClient exposes such a knob, and trying to avoid the issue this way
is a dead end anyway.

The simplest solution is probably to employ a feature known as MARS,
Multiple Active Result Sets. MARS was designed exactly for the scenario
you have. With MARS you can retrieve rows from a SELECT query and on the
same connection run UPDATES against the table you are updating. To
use MARS, you need to say something in the connection string, please refer
to MSDN for details.

Just like snapshot isolation, MARS will take a toll on the version store
in tempdb, but since enabling snapshot isolation affects everything that
goes on in the database that has a much wider repercussion than using
MARS.

But before you go ahead and implement MARS, I should say that I'm fairly
lukewarm to this feature. I believe it works very well for a program such
as yours, but MARS offers many possibilties to shoot yourself in the foot.
One thing that is important to understand is that when there are multiple
commands executing on the same connection, they are not executing in
parallel, but the statements are interleaved.

Nevertheless, I think MARS is a better solution than using any of the
snapshot models that were suggested in the thread, and it's definitely
a better choice then READ COMMITTED SNAPSHOT. Both snapshot models
requires settings be made the database owner, and if the database is
later on reconfigured to no longer use snapshot, your program will fail.
If you rely on READ COMMITTED SNAPSHOT being in effect, your program
will just hang. If you issue a SET TRANSACTION ISOLATION LEVEL SNAPSHOT,
your program will at least fail loud and clear if snapshot isolation
has been disabled. On the other hand, there is no risk that anyone
configures MARS out of the picture, but that is solely in your code.

There are still more solutions to consider. One is to look if it is
possible to index the table such that the lock on a row has been released
when you update it. Then again, if some change the index, the program
will fail.

Another possibility is to do the buffering yourself. Read a reasonable
amount of rows into a dataset with the .Fill mehtod. Now you can easily
update the rows on the same connection without using MARS.

Yet another possibility would be to keep the reader as you have now,
but defer the updating until you have done all rows, and then update
them all at the end. If you are on SQL 2008, you would pass the data
for the update in a table-valued parameter; On SQL 2005 you could use
an XML string that you shred in SQL Server.

--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Jesús López · Jan 11, 2009

There are other options:

1) Enclose the whole operation in a transaction scope. Two or more
connections enclosed in a transaction scope do not block each other because
the connections are binded to a single transaction.

2) Bind the connections. You can use sp_bind stored procedure to bind two
connections to a single transactions. This has the same effect as 1)

3) Rewrite the entire operation to be done using just T-SQL in a single
batch. You are reading rows from a table and updating rows in a table, I'm
pretty sure the entire operation can be done entirely by a single SQL batch
that uses set based operations, instead of reading from a datareader and row
by row processing. I don't know what exactly you want to perform (you did
not show us the entire code), but it seems that it's something like this:

SET ROWCOUNT <the number of rows you want to update>

UPDATE Table
SET
Status =
CASE
WHEN <criteria for setting status = 1> THEN 1,
WHEN <criteria for setting status = 2> THEN 2,
ELSE Status
END,
Comment =
CASE
WHEN <condition 1> THEN <value 1>
WHEN <condition 2> THEN <value 2>
ELSE Comment
END
WHERE <criteria for updating>

SET ROWCOUNT 0

I thing the best approach is 3)

Regards:

Jesús López

Jesús López · Jan 11, 2009

I have reread the original post and I think the query would be something
like this:

SET ROWCOUNT @RowsToUpdate
UPDATE URLs
SET
Status =
CASE
WHEN <error condition> THEN 2
ELSE 1
END,
PollCount =
CASE
WHEN <error condition> THEN PollCount + 1
ELSE PollCount
END,
Comment =
CASE
WHEN <error condition> THEN 'Custom error'
ELSE 'Success'
END
WHERE
Status = 0 AND PollCount < 5

SET ROWCOUNT 0

Regards:

Jesús López

Jesús López · Jan 11, 2009

Hi Plamen,

Thanks for pointing this out, I had not noticed this.

I used SET ROWCOUNT because TOP does not work with UPDATE statements in SQL
Server 2000.

Anyway, the query can be rewritten in the following way for SQL Server 2005
and later:

UPDATE TOP (@RowsToUpdate) URLs
SET
Status =
CASE
WHEN <error condition> THEN 2
ELSE 1
END,
PollCount =
CASE
WHEN <error condition> THEN PollCount + 1
ELSE PollCount
END,
Comment =
CASE
WHEN <error condition> THEN 'Custom error'
ELSE 'Success'
END
WHERE
Status = 0 AND PollCount < 5

It can also be rewritten in the following way with similar (but not equals)
effects:

UPDATE URLs
SET
Status =
CASE
WHEN <error condition> THEN 2
ELSE 1
END,
PollCount =
CASE
WHEN <error condition> THEN PollCount + 1
ELSE PollCount
END,
Comment =
CASE
WHEN <error condition> THEN 'Custom error'
ELSE 'Success'
END
WHERE
ID IN (
SELECT TOP <RowsToUpdate> ID
FROM Urls
WHERE Status = 0 AND PollCount < 5
ORDER BY Url
)

I'd like to highlight that the idea of performing the whole process by just
one single T-SQL batch remains valid, and I think it is the best approach.

Regards:

Jesús López

Jonathan Wood · Jan 22, 2009

Sorry for the slow reply. I got wrapped up in other stuff and wanted total
focus on this subject.

Why did this work before, and then stopped working one day? The answer
to this is that although you read one row at a time in your code, the
client API will get a bunch of rows at a time from SQL Server. So as long
as the SqlClient could get all rows in the result set in one go, you were
fine, because by the time you performed the UPDATE the locks from the
SELECT were released.

But the day when the result set grow in size, or something else changed
that caused ADO .Net to get only some of the rows, and now the UPDATE was
blocked, because SQL Server still held the locks, as the SELECT was
still running.

I can't answer why the problem suddenly started, but the data was about 120K
rows from the start. So I'm not certain it had anything to do with a growing
data set.

But before you go ahead and implement MARS, I should say that I'm fairly
lukewarm to this feature. I believe it works very well for a program such
as yours, but MARS offers many possibilties to shoot yourself in the foot.
One thing that is important to understand is that when there are multiple
commands executing on the same connection, they are not executing in
parallel, but the statements are interleaved.

I can look into this. However, I am concerned about taking any more of a
toll on any data because the database is substantial. I also am wondering
why this is off by default and what would, therefore, be the ramifications
for all other queries that use the same connection string.

I'm not against a second table for this purpose as it addresses some
additonal problems I need to solve. However, I still need to flag which
items in the second table have completed as I go, so it kind of puts me back
in the same place I started.

Nevertheless, I think MARS is a better solution than using any of the
snapshot models that were suggested in the thread, and it's definitely
a better choice then READ COMMITTED SNAPSHOT. Both snapshot models
requires settings be made the database owner, and if the database is
later on reconfigured to no longer use snapshot, your program will fail.
If you rely on READ COMMITTED SNAPSHOT being in effect, your program
will just hang. If you issue a SET TRANSACTION ISOLATION LEVEL SNAPSHOT,
your program will at least fail loud and clear if snapshot isolation
has been disabled. On the other hand, there is no risk that anyone
configures MARS out of the picture, but that is solely in your code.

Well, my eyes start glazing over with some of these options. That may be one
of the reasons I'm leaning towards simply creating a second table

There are still more solutions to consider. One is to look if it is
possible to index the table such that the lock on a row has been released
when you update it. Then again, if some change the index, the program
will fail.

I had wondered about reading the next row before doing the update associated
with the previous row. But it sounds from your comments that this may not
resolve the issue if it may be locking several rows at a time.

Another possibility is to do the buffering yourself. Read a reasonable
amount of rows into a dataset with the .Fill mehtod. Now you can easily
update the rows on the same connection without using MARS.

I've been using SqlReader for this. Is there a way to release the lock on
the current row (if reading the next row won't do it)?

And if I changed to use a DataSet, how to I ensure the lock is released
after reading a block of rows?

Yet another possibility would be to keep the reader as you have now,
but defer the updating until you have done all rows, and then update
them all at the end. If you are on SQL 2008, you would pass the data
for the update in a table-valued parameter; On SQL 2005 you could use
an XML string that you shred in SQL Server.

Yeah, these are the type of ideas I'm leaning to. As mentioned, one approach
is using in another table. But I still need to flag items in the table to
track which have completed.

Thanks for your comments.

Jonathan

Jonathan Wood · Jan 22, 2009

Thanks for your comments.

I didn't quite get your point but I've printed out your post and will study
it more closely.

Jonathan

Jesús López said:
Hi Plamen,

Thanks for pointing this out, I had not noticed this.

I used SET ROWCOUNT because TOP does not work with UPDATE statements in
SQL Server 2000.

Anyway, the query can be rewritten in the following way for SQL Server
2005 and later:

UPDATE TOP (@RowsToUpdate) URLs
SET
Status =
CASE
WHEN <error condition> THEN 2
ELSE 1
END,
PollCount =
CASE
WHEN <error condition> THEN PollCount + 1
ELSE PollCount
END,
Comment =
CASE
WHEN <error condition> THEN 'Custom error'
ELSE 'Success'
END
WHERE
Status = 0 AND PollCount < 5

It can also be rewritten in the following way with similar (but not
equals) effects:

UPDATE URLs
SET
Status =
CASE
WHEN <error condition> THEN 2
ELSE 1
END,
PollCount =
CASE
WHEN <error condition> THEN PollCount + 1
ELSE PollCount
END,
Comment =
CASE
WHEN <error condition> THEN 'Custom error'
ELSE 'Success'
END
WHERE
ID IN (
SELECT TOP <RowsToUpdate> ID
FROM Urls
WHERE Status = 0 AND PollCount < 5
ORDER BY Url
)

I'd like to highlight that the idea of performing the whole process by
just one single T-SQL batch remains valid, and I think it is the best
approach.

Regards:

Jesús López

Erland Sommarskog · Jan 22, 2009

Jonathan said:
I can't answer why the problem suddenly started, but the data was about
120K rows from the start. So I'm not certain it had anything to do with
a growing data set.

Hm, it could also be a case of different query plans. Some query
plans gives you row at once, whereas others are able to give you rows
as they are found.

I can look into this. However, I am concerned about taking any more of a
toll on any data because the database is substantial. I also am wondering
why this is off by default and what would, therefore, be the ramifications
for all other queries that use the same connection string.

You should probably only use MARS when you really need it. Else there
could be surprises. And since you set MARS in the connection string,
this means that other queries uses another connection string.

MARS if off by default, because, well, it was a new feature added in
SQL 2005, so having it off is more compatible.

MARS also takes a little more resources on the version store in SQL Server,
and as I indicated, it is a feature that you easily can get lost with.

I had wondered about reading the next row before doing the update
associated with the previous row. But it sounds from your comments that
this may not resolve the issue if it may be locking several rows at a
time.

Yes, just reading the next row is not likely to help. As I said there
is buffering that you have little control over. However, what you can
do is to exhaust the reader, close it and then perform the update.

And if I changed to use a DataSet, how to I ensure the lock is released
after reading a block of rows?

When you use .Fill, the DataAdapater uses an SqlReader and reads all
rows, and then closes the reader. Once the reader is closed, there no
locks any more as the statement is now completed.

--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Jonathan Wood · Jan 23, 2009

Hm, it could also be a case of different query plans. Some query
plans gives you row at once, whereas others are able to give you rows
as they are found.

Yes, I was playing with my queries and code at this time, so perhaps it's
something along those lines.

Yes, just reading the next row is not likely to help. As I said there
is buffering that you have little control over. However, what you can
do is to exhaust the reader, close it and then perform the update.

[...]

When you use .Fill, the DataAdapater uses an SqlReader and reads all
rows, and then closes the reader. Once the reader is closed, there no
locks any more as the statement is now completed.

Interesting. So I could either use a data adapter to fill a dataset, or I
could use a data reader to read N rows and then close it. Either one should
eliminate any locking?

If so, perhaps this approach is the easiest then. I'll just have to do my
processing in chunks of N rows at a time. If I understood you correctly,
this was suggested by one of your ideas.

Thanks.

Jonathan

William Vaughn \(MVP\) · Jan 23, 2009

RE MARS: I have yet to see a single implementation that solved any
performance or scalability problem. MARS is complex and does not do what
many think it does. IMHO it's a useless feature that was implemented so that
Microsoft could be compared with Oracle that supports full-duplex
connections. It has several ugly side-effects and can crumble into ashes if
pushed too hard. MARS: just say no.

--
__________________________________________________________________________
William R. Vaughn
President and Founder Beta V Corporation
Author, Mentor, Dad, Grandpa
Microsoft MVP
(425) 556-9205 (Pacific time)
Hitchhiker’s Guide to Visual Studio and SQL Server (7th Edition)
http://betav.com http://betav.com/blog/billva
____________________________________________________________________________________________

Cor Ligthert[MVP] · Jan 23, 2009

Bill,

This opinion from you saves me time when I once would get the idea looking
better to it.

Thanks,

Cor'

Erland Sommarskog · Jan 23, 2009

Jonathan said:
Interesting. So I could either use a data adapter to fill a dataset, or
I could use a data reader to read N rows and then close it. Either one
should eliminate any locking?

The whole point is that the SELECT statement must run to an end. And you
know that this happens when you close the reader, or when you get no
more rows back. With Fill, the reader is openend and closed for you,
so that you don't have to worry.

--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Database Blocking -- Why?

Erland Sommarskog

Jesús López

Jesús López

Jesús López

Jonathan Wood

Jonathan Wood

Erland Sommarskog

Jonathan Wood

William Vaughn \(MVP\)

Cor Ligthert[MVP]

Erland Sommarskog