C#, ADO.NET and MS-SQL

DJ van Vliet · Jul 17, 2004

Hi All,

Question regarding the processes between C#, ADO.NET and MS-SQL

I need to understand the complexities of using the distributed method
(ADO.NET) of data retrieval and the subsequent update of database with
regard to transactions.

E.G. - Stock Item Purchase Order.

Header:
Ledger, Supplier, Order Number

Details:
Store, Item, Qty, UnitPrice

Header Entry:
database: BeginTransaction
1.Enter/Select Ledger
Store all ledger details on client when loading app
or
Use a search form, allowing the user to enter details. Use a Reader to
retrieve matching Ledgers.
2.Enter/Select/Create Supplier
Store all Supplier details - not practical?
or
Use search form, allowing user to enter details. If the user has the
proper authority, a new Supplier may be entered. Use reader to retrieve
matching Suppliers.

Detail Entry:
Basically the same principal for the Store and Item codes

Accept Order:
database: EndTransaction

It does not seem (to me) that the DBAdapter will be useful as I need to
obtain data as the user enters/selects the Ledger, Supplier, Store and
Item fields. I cannot store potential data because I have no idea what
will be selected (the Ledger and Store fields could be stored as they
are fairly static)

Should I design the above process by using my own version of storage and
not using the DBAdapter. If so, I am looking at a real-time system as
apposed to the ADO.NET model (disconnected)

I hope my question is not a clear as mud.

Dougie

Daniel Billingsley · Jul 17, 2004

Well, maybe not mud but pretty cloudy. Please don't take this the wrong
way, but if I'm understanding you I'd say you need to get a good ADO.NET
book and go through it, as you describe the whole process of interacting
with the database on a UI in a very unconventional way. I don't mean to be
smug but there's so many points I wouldn't know where to start in
responding.

For example

It does not seem (to me) that the DBAdapter will be useful as I need to
obtain data as the user enters/selects the Ledger, Supplier, Store and
Item fields.

There's a gazillion ADO.NET based applications that do that.

DJ van Vliet · Jul 17, 2004

Thanks for the reply,

I have a copy of MS ADO.NET step by step and Chris Sells' Windows forms
programming in C#

The problem is that all these books and articles show pieces of an
application - e.g. a DataGrid or a Combo/ListBox containing lookup
tables. What happens if there are 35,000 records that could be
selected? I do not want to bring all the records across. This means
interactive connection to the database while the user is busy capturing
- not exactly a distributed method!

Let me try and explain it better. (Maybe a short example.)

The user requires a new purchase order.

Known:
Layout of Order header, a supplier is required and Items will be ordered.

I need to build up a DataAdapter/DataSet as the new order is being captured.
Two tables - Header and Details.

Header: selection of Ledger and Supplier.
Details: selection of Store and Items

Will the DataAdapter/DataSet be in a position to use the Update, Insert
and Delete commands to commit changes back to the database?

Nick Malik · Jul 17, 2004

Hello Dougie,

First off, a transaction should surround a set of operations on a database
where the database would be updated. In the example below, you begin the
transaction around "reading" the database and updating a local dataset... no
updates to the database. Do NOT start the transaction at the beginning of a
data entry form! Collect all your changes, wrap them together, and submit
them at once (DBAdapters are good for this). Otherwise, use small
transactions that keep the database consistent (like adding a supplier and
details as one transaction, and adding an order with details as another).

Secondly, I don't know how large or complex your application is, but as more
scalability is needed, a couple of compromises usually occur:
1) Move less data to the client during a "fill-out-the-form" operation, and
2) Make the message from the client to the server more complete (in the
sense that this message will contain all of the information that is needed
to either insert, update, or delete the records when it gets to the server).

Architecturally, I'd recommend the following:

1) Download nothing to the end user (browser client) that isn't absolutely
necessary to fill out the form.

2) Provide a more restricted search capability than a "show everything" kind
of list. E.G. if the user needs to select from 10,000 suppliers, allow a
user to select a small set of "default" suppliers or a set that they are
working on "today", or to query on suppliers by a simple category or search
taxonomy, so that they can pick from only 20 or so. Store the result set in
a DataSet for convenience, or just a collection, whatever works for you.

3) Allow the user to create a supplier on the client without checking
credentials. If a user doesn't have authority, respond with an error
message after they post the form. (a) they won't make that mistake twice,
and (b) you are spending buckets of time solving a problem that humans don't
face... people know when they don't have the right to do something. YAGNI

4) You are right not to try to anticipate the data. That would be foolish.
Just allow the user to enter data that either matches a known value (like a
supplier code) or look up a supplier and select one to use.

5) You are talking like you want real scalability (1,000+ concurrent users).
If this is the case, place middleware on the database server. Don't have
the ASP.NET web server talk to ADO at all. Have the web app communicate
with the middleware on the db server. Communicate using MSMQ or some other
async mechanism. Use the concept of cooperating independent objects (see my
article:
http://www.ftponline.com/vsm/2004_06/magazine/departments/guestop/). This
is +massive overkill+ if you have less than 100 concurrent users. You just
didn't say in your message.

I hope this helps,
--- Nick Malik
Solutions Architect, MCSD

Jon Skeet [C# MVP] · Jul 17, 2004

Nick Malik said:
First off, a transaction should surround a set of operations on a database
where the database would be updated. In the example below, you begin the
transaction around "reading" the database and updating a local dataset... no
updates to the database.

Is it not possible that transactions around just reads are useful too?
Consider the following situation:

Table A:
EmployeeID Name Department

Table B:
JobID EmployeeID DueDate

(I'll assume it's obvious what the point of the tables is.)

Now, you want to read all the jobs. If you don't use transactions, then
between reading table A and reading table B, another client could come
and (atomically, with a transaction) add an employee to table A and a
job for them in table B. That means that you end up reading a job where
you've not read the related employee record.

I *believe* (although I haven't tested this) that if you enter a
transaction for reading, that should make sure you see *all* the tables
as they were when you started the transaction (or maybe when you
perform the first operation within the transaction). That would mean
that in this case, you don't see the new job or the new employee.

(Reading the tables in the opposite order, B then A, would take care of
that particular example - but another client could be *deleting*
records from both B and A, leaving the same problem.)

Daniel Billingsley · Jul 19, 2004

Jon, I recently did some investigation into this very topic. It turns out,
no, read transactions won't prevent the kind of data "error" you describe,
as Sql Server read transactions (as per ANSI) only lock each row as the
cursor reads it. Depending on the isolation level, it may release the lock
immediately afterward or keep it until the query is done. So it only blocks
writes that's it's already read anyway, and that's only when a higher
isolation level is set. The main (if only?) point of Sql read transactions
seem to be so that the query will be repeatable within the transaction.

And BTW, read cursors by default automatically have the same isolation a
transaction by default would have, so it's completely redundant to wrap a
read in a transaction if you don't explicitly bump up the isolation level.

Apparently Oracle works the way you describe, by using a sort of change log
and using it to retrieve the state of each requested row as of the time you
started the transaction. I was told that's one of the reasons why
performance tuning is so critical and demanding for Oracle (you can imagine
the work required to accomplish such a thing).

I would imagine this has been a barrier to Sql Server's use in areas like
financial, where read integrity as you described is everything to the
application.

Yukon is supposed to have a similar option per Feb 2004 MSDN magazine.

Jon Skeet [C# MVP] · Jul 19, 2004

Daniel Billingsley said:
Jon, I recently did some investigation into this very topic. It turns out,
no, read transactions won't prevent the kind of data "error" you describe,
as Sql Server read transactions (as per ANSI) only lock each row as the
cursor reads it.

Yuk. That's really horrible...

Apparently Oracle works the way you describe, by using a sort of change log
and using it to retrieve the state of each requested row as of the time you
started the transaction. I was told that's one of the reasons why
performance tuning is so critical and demanding for Oracle (you can imagine
the work required to accomplish such a thing).
Yup.

I would imagine this has been a barrier to Sql Server's use in areas like
financial, where read integrity as you described is everything to the
application.

I would have thought integrity would be vital to pretty much *every*
application though!

Yukon is supposed to have a similar option per Feb 2004 MSDN magazine.

Goodo - I'll see if I can have a look at that.

Daniel Billingsley · Jul 19, 2004

I apoligize for my answer. I was trying to rush out of here and just
shouldn't have answered.

But it's still true that you've asked some very BIG architecture-level
questions.

Perhaps if you asked specific questions one by one it would be better.

As for this example below - I'm not sure what you mean by "in a position".
There's lots of different ways to accomplish what you describe. Some
people/models avoid datasets altogether and use business object classes to
gather and valid the information. They may fire a stored procedure - so
there would never actually be a dataset at all.

As for the 35,000 record issue - that's a general design problem which has
also been solved numerous ways. One way is instead of having a combo box
(where listing 35,000 choices wouldn't be real helpful) there is a search
button next to the field that opens up a more sophisticated search form. As
someone else mentioned, you could also use some sort of filtering to narrow
the 35,000 records down to a small bunch the user is *likely* to select from
in the first place. Maybe a combination of the two.

But the point is there's nothing inherent in ADO.NET that really prohibits
or limits your ability to make it work however you want.

Daniel Billingsley · Jul 19, 2004

Jon Skeet said:
I would have thought integrity would be vital to pretty much *every*
application though!

I've given this a bit of thought too, and that's true to a point.

To play the advocate, I would say that any read is obviously potentially
"invalid" the moment you complete it. That is, any query is really only a
snapshot at *some* point. In a fully normalized database I think it would
be a rare query indeed that would produce results that were horribly wrong.
I mean if it includes a row that was added since the query began it's the
same as as if you had just started your query a bit later.

You might think there would be a problem if you had, say a Total column in
the order header which you'd expect to always add up to the totals of the
OrderDetail records - but such redundancy MUST be controlled very carefully
(from what I can tell it is not technically a violation of any normal
form).. namely by updating both tables only in the context of a
transaction... in which case your read is fine because that update
transaction would be blocked until the read was finished (if you used a high
enough isolation level on the read).

I hope that made some sense. I believe in real world it's only in certain
types of applications, namely ones where the instant of the snapshot IS
critical. Financial, stock types of applications come to mind, but for
typical CRUD business applications I think it's not really a big deal.

Here's a great article written by IBM (same concurrency model as Sql Server)
on the topic:

http://www.xs4all.nl/~gertjans/usenet/microsoft.public.sqlserver.programming/readconsistency.pdf

Guess we're getting way off newsgroup topic here so maybe best to continue
offline?

DJ van Vliet · Jul 19, 2004

Thanks for the reply,

I am going through the two books and making notes.

I have been programming for a 'good' number of years (no commercial Web
or OO development).

The issues I have are that 1) I need to understand how the language
(.NET) processes work so that I can get the best out of the development
system/Language I wish to use, and most importantly, to design the
system 'correctly' (according the the language spec) in the first place.
I know that there will be many iterations, but hopefully no rewrites
from scratch

The issue I have is that ADO.NET applies a disconnected architecture
(open, read,close,open,update database,close), but when capturing data,
eg. Order Entry, Each time the user enters a supplier or stock item, the
database has be be accessed - even if you then add the row to the
RowSet, there is still massive interaction with the database -
open,read,close on each item entered by the operator.

To me this is not good from a 'disconnected' architecture point of view.

As they say in South Africa - Is my thinks okay 'of sit ek die pot mis'
(am I completely on the wrong track)

Dougie

Daniel Billingsley · Jul 19, 2004

I think the disconnected model is pretty standard these days and if nothing
else is being pushed strongly by the trend towards more disconnected
systems. Web UI, web services, etc. all demand a disconnected data access
model.

I don't think the interaction with the database is "massive":

First, you have the option of waiting to do the validation or addition of
that selection, like you do on a web UI. Not a great idea for a WinForm in
my opinion, but the option is there.

Second, you're talking about a very small query if it's just to verify the
existence of the selection.

And third, connection pooling will likely kick in such that the actual
connection to the database is kept alive between actions. Take out all the
"close" and "open" except the first one in your example below and it looks
like (open, read, update). So, the "disconnected" ADO.NET doesn't really
look any different than what you're comparing it negatively to.

Jon Skeet [C# MVP] · Jul 20, 2004

Daniel Billingsley said:
I've given this a bit of thought too, and that's true to a point.

To play the advocate, I would say that any read is obviously potentially
"invalid" the moment you complete it.

I don't think that's obvious - it should be valid within the context of
the same transaction (IMO).

That is, any query is really only a snapshot at *some* point.

That I agree with. The question is whether you can make two queries
which are part of the same transaction read data from the same
snapshot.

In a fully normalized database I think it would
be a rare query indeed that would produce results that were horribly wrong.
I mean if it includes a row that was added since the query began it's the
same as as if you had just started your query a bit later.

Only if you're only performing a single query though. The problem is
when you want a snapshot comprising several queries. Yes, you can do
joins, but that's not always an appropriate answer.

I would imagine the example I gave earlier would be fairly common - two
related tables where the primary key of one is the foreign key of the
other. If you take two separate snapshots you just *can't* guarantee
consistency, which is the point (well, one of the points) of
transactions in the first case.

You might think there would be a problem if you had, say a Total column in
the order header which you'd expect to always add up to the totals of the
OrderDetail records - but such redundancy MUST be controlled very carefully
(from what I can tell it is not technically a violation of any normal
form).. namely by updating both tables only in the context of a
transaction... in which case your read is fine because that update
transaction would be blocked until the read was finished (if you used a high
enough isolation level on the read).

Ah - that's the solution then. If the updating transaction is blocked
until all the reads in the querying transaction are complete,
everything's okay. Not sure why I missed that before. It's a pain for
performance though...

I hope that made some sense. I believe in real world it's only in certain
types of applications, namely ones where the instant of the snapshot IS
critical. Financial, stock types of applications come to mind, but for
typical CRUD business applications I think it's not really a big deal.

Only if you think CRUD business apps don't require consistency

Here's a great article written by IBM (same concurrency model as Sql Server)
on the topic:

http://www.xs4all.nl/~gertjans/usenet/microsoft.public.sqlserver.prog
ramming/readconsistency.pdf

Thanks very much - I'll have a look at that later.

Guess we're getting way off newsgroup topic here so maybe best to continue
offline?

I think I've probably reached the limit of where I was going with this
thread, so I've posted this article here rather than mailing you - I
suspect even if you've got something else useful to add, I haven't!

DJ van Vliet · Jul 20, 2004

Hi Daniel,

I am not trying to be negative, i'm trying to put the complete model in
perspective so that I can start my design phase.

It is just that in theory it is a 'disconnected' model, but it works
like as a client/server model except that the connection pool maintains
the connection - always a shared open connection.

The benefit on the client machine is the fully functional mini
relational database - fantastic. I am still going though the processes
that are involved in keeping the local and server databases in sync and
will be asking for more help.

Thanks so far!

Dougie

Daniel Billingsley · Jul 20, 2004

Just for clarification...

Jon Skeet said:
I don't think that's obvious - it should be valid within the context of
the same transaction (IMO).

I meant "invalid" in the sense of not necessarily still representing the
"real" state of the data. If I hang on to a query for even 1 minute on a
volatile enough table I've got a lot of junk. I'm just saying that's
theoretically true even if the elapsed time is 1 second.

That I agree with. The question is whether you can make two queries
which are part of the same transaction read data from the same
snapshot.

You can sort of do that with current MS concurrency mechanism. The higher
isolation levels *leave* the row locks on as the cursor passes over each
one. They block writes on those rows, but not other reads. They also don't
block inserts, so there's still potentially a problem on a second query.

Only if you're only performing a single query though. The problem is
when you want a snapshot comprising several queries. Yes, you can do
joins, but that's not always an appropriate answer.

See previous answer. Perhaps I was misleading in my first response as I was
really only considering single queries.

Ah - that's the solution then. If the updating transaction is blocked
until all the reads in the querying transaction are complete,
everything's okay. Not sure why I missed that before. It's a pain for
performance though...

Yep. And gridlock potential. That's why the Oracle mechanism seems pretty
cool, but it comes with it's own issues.

Only if you think CRUD business apps don't require consistency

Just consistent "enough" IMO. If I ask for a customer's orders and someone
slips one in in the middle of my query running it's not much different than
if they slipped one in immediately after my query finished. Unless I'm
going to leave a transaction open for the entire time the user is
interacting with the UI this is still a potential problem that the Oracle
mechanism doesn't address. Which is why I say, from a pragmatic standpoint,
once you've dealt with that potential and guarded your redundancies like I
mentioned before the Sql concurrency mechanism is most likely good enough
for most apps.

Jon Skeet [C# MVP] · Jul 21, 2004

Just consistent "enough" IMO. If I ask for a customer's orders and someone
slips one in in the middle of my query running it's not much different than
if they slipped one in immediately after my query finished.

There's a big difference if you end up with a customer order but you
don't have a customer entry for that customer due to them being a new
customer with that order and you having fetched the customer table
before fetching the order table, however. I don't know about you, but I
certainly don't want to have to be checking for table consistency (all
foreign keys being valid etc) when the database is supposed to be doing
that for me - to me, if I fetch inconsistent data, that repreesents an
*error* condition, not one that might reasonably happen without any
other problems.

Unless I'm going to leave a transaction open for the entire time the user is
interacting with the UI this is still a potential problem that the Oracle
mechanism doesn't address. Which is why I say, from a pragmatic standpoint,
once you've dealt with that potential and guarded your redundancies like I
mentioned before the Sql concurrency mechanism is most likely good enough
for most apps.

If you use the high isolation level which ensures consistent reads,
that's fine - but without that, it's asking for race conditions, IMO.

Daniel Billingsley · Jul 21, 2004

Jon Skeet said:
There's a big difference if you end up with a customer order but you
don't have a customer entry for that customer due to them being a new
customer with that order and you having fetched the customer table
before fetching the order table, however. I don't know about you, but I
certainly don't want to have to be checking for table consistency (all
foreign keys being valid etc) when the database is supposed to be doing
that for me - to me, if I fetch inconsistent data, that repreesents an
*error* condition, not one that might reasonably happen without any
other problems.

Don't see how that could happen. The database can do the checks, but
outdated data can't be protected against by either concurrency mechanism.
With Oracle or MS you still have to handle a delete failing because someone
else has deleted the record while you were thinking about it, unless you
want to do pessimistic locking which would cause major performance and
gridlock issues.

Jon Skeet [C# MVP] · Jul 21, 2004

Daniel Billingsley said:
Don't see how that could happen. The database can do the checks, but
outdated data can't be protected against by either concurrency mechanism.
With Oracle or MS you still have to handle a delete failing because someone
else has deleted the record while you were thinking about it, unless you
want to do pessimistic locking which would cause major performance and
gridlock issues.

If you don't allow any updates to the tables you're looking at while
your set of queries is executing, you should always be able to see
consistent data. It's one thing for a delete to fail - it's another
thing to only see *one* delete in a transacted pair, due to having read
one table before another.

Nick Malik · Jul 21, 2004

There's a big difference if you end up with a customer order but you

With Oracle or MS you still have to handle a delete failing because someone
else has deleted the record while you were thinking about it, unless you

I disagree.

In a disconnected model, you would not care.

That is because in a disconnected model, the database does not create the
foreign key field that connects the customer order to the customer record...
the application does.

So you get your two queries. In the customer table query, you find the
customer record. (then the customer record is deleted from the customer
table behind your back). Then, in the order header table, you don't find
the order header (hmmm: in a disconnected model, why would you look for it?)

So you create a new order with order lines.

You pass back, to the data layer running on the server, three things:
The customer record that you looked up.
The order header record
The order detail record.

The data layer does an (insert/update) on the customer record (their address
may have change). In this case, the (insert/update) operation inserts the
record. The customer id is a GUID, so you don't have to wait for this
number to progate back to the front end. The data layer doesn't assign
this. The business layer does. The customer doesn't see it.

The order header and order detail are added normally.

Who cares if a row is deleted, or even updated, when you are "not looking".
That's not the database's job. Just put it back.

Look, contention happens in the paper world too. If I send two forms to
Human Resources in my company, both containing updates to my benefit plans,
and on one form, I update my mailing address, I have no idea if my mailing
address will be correct in their records when it is done. This is an
exception condition, and honestly, in a well run company, I'd expect the HR
person to call me on the phone and ask what mailing address is current.

In a disconnected model, you would NEVER leave open a transaction during a
U/I operation.

There are better ways.

--- Nick Malik

Nick Malik · Jul 21, 2004

It is just that in theory it is a 'disconnected' model, but it works
like as a client/server model except that the connection pool maintains
the connection - always a shared open connection.

Hello again,

The tools support a disconnected model, but you have to be willing to
implement it that way.

I do not see why you would need to touch the database table that you are
updating between the initial query and the update.

If you are complaining that you have to look up data from the other tables
to fill a user interface, and that seems "client server" to you, then...
don't do that.

Make the user type a value and, when the data comes back to the back end,
match it up with the database. (not your friendliest interface, but this is
how mainframe work was done for 30 years, so don't think it's untenable.)
That's why I suggested a "search" function. If the user types in a value
that they know to be uniquely identifying their top customer (like a
human-friendly customer id), you can look up the actual record in the data
layer and make the correct join.

The point of a disconnected model is that you don't have to hit the database
at all before you write the new record to it, if you don't want to. It
isn't to help you write the interface... but if you write an interactive
interface, you will need to get the data from somewhere.

I don't see what the problem is all about.

--- Nick

Daniel Billingsley · Jul 22, 2004

There's no way around changes happening in the database while the user is
doing the UI thing, and God help your users if you don't care.

C#, ADO.NET and MS-SQL

DJ van Vliet

Daniel Billingsley

DJ van Vliet

Nick Malik

Jon Skeet [C# MVP]

Daniel Billingsley

Jon Skeet [C# MVP]

Daniel Billingsley

Daniel Billingsley

DJ van Vliet

Daniel Billingsley

Jon Skeet [C# MVP]

DJ van Vliet

Daniel Billingsley

Jon Skeet [C# MVP]

Daniel Billingsley

Jon Skeet [C# MVP]

Nick Malik

Nick Malik

Daniel Billingsley