XML to database?

Gary · Nov 20, 2007

I have a requirement to read element values (can be in thousands) from an
xml file, and eventually write them all the SQL Server database.
I was thinking we could first write all the values from xml to some
datastructure (arraylist etc.) and then write from that datastructure to the
DB with a single database call.

Which would be the best possible datastructure to achieve this? Any
pointers/suggestions are appreciated.

Guest · Nov 20, 2007

One easy way to approach this is to use the DataSet class' ReadXml method to
load your Xml into a DataSet, find the respective tables that contain what
you need, and pass it into a SqlDataAdapter's Update method. Done!

--Peter
"Inside every large program, there is a small program trying to get out."
http://www.eggheadcafe.com
http://petesbloggerama.blogspot.com
http://www.blogmetafinder.com

Nicholas Paldino [.NET/C# MVP] · Nov 20, 2007

Gary,

I would let SQL Server do it. You can pass the XML to a stored
procedure and use the xp_xml_preparedocument stored procedure, along with
the OPENXML function to insert into your table. For more information, check
out the section of SQL Server 2005 Books Online titled "Using FOR XML and
OPENXML to Publish and Process XML Data", located at:

http://technet.microsoft.com/en-us/library/ms191268.aspx

sloan · Nov 20, 2007

See an example here:
http://groups.google.com/group/microsoft.public.sqlserver.programming/msg/0bd13f38f82bec77

But basically, creating a strong dataset ... and then sending in the
ds.GetXml() to the stored procedure will do the trick.

I do this alot... as it gives me "set based" abilities, with well defined
xml via a (strongly typed) dataset.

sloan · Nov 20, 2007

That will definately work (peter's idea).

The caveat here is that... it will end up being a "row by row" update....

The OPENXML allows a 1 db hit, bulk (set based) approach (see my other
post).

Do the development goals need to be understood.

Does the OP want fastest way to implement, or best performing? (of which I
don't know the answer).

...

One of the key things about the setbased approach over the row-by-row is
that index rebuilding may be expensive.
The set-based approach allows index to re-update after one big set-based
insert or update, rather than after every row-by-row update.

The OP does mention "in the thousands", thus a set-based/bulk insert would
be the way I would go here.
You can throttle the number of records that go into the dataset as well, say
....every 1000 records in the dataset, gets sent to the db.

I've handled 4 MB xml files with no issues using the OPENXML method, I have
not tested the upper limits, so don't know beyond that.

But most of my stuff is under 1/3 meg ....which works well with OPENXML
IMHO.

You can also throw in some BEGIN TRAN type stuff into the usp (stored
procedure) so you get a very nice "all or nothing" ability.

Marc Gravell · Nov 20, 2007

I think this is a good approach (I have used similar in a bulk-data,
semi-structure scenario) - but just to highlight that SQL Server 2005
(and above) has the "xml" data-type, which allows you to handle xml
(shred, query, build, update, index, etc) /much/ more efficiently than
OPENXML can. Of course, if it is just storage/retrieval then go
"varbinary(max)" and have done with it (varbinary avoids codepage
issues associated with [n]varchar).

As an example, you can pass a single string over to an SP, cast it to
xml in the database and tear it apart with XPath; minimises round
trips.

Marc

Marc Gravell · Nov 20, 2007

I think this is a good approach
(sorry, I was responding to Nicholas's post re xp_xml_preparedocument;
not sure if it will appear that way in all clients ;-p)

sloan · Nov 20, 2007

Yeah, my suggestion (another post) will work with either 2000 or 2005.

If the OP gives the version of Sql Server, that would present a few more
options (to which you are illuding).

Marc Gravell · Nov 20, 2007

I've handled 4 MB xml files with no issues using the OPENXML method, I have

no tested the upper limits, so don't know beyond that.

Just for info; for big imports, I have successfully used a hybrid
XmlReader / IDataReader to present the first-level elements (which are
frequently the most common) as successive rows in a data-reader - i.e.

<xml><row>...</row><row>...</row><row>...</row></xml>

would get parsed as 3* rows in an IDataReader, but it works while /
streaming/ rather than having to load it all into memory first. In
particular you can pass your spoof IDataReader directly to an
SqlBulkCopy to pump it (again, streaming) into a staging table (with a
key to identify the batch), before finally invoking an SP to commit
the whole lot into the real tables. This allows you to limit any
transaction to just the final merge, avoiding long locks during the
import process (especially important as I was getting chunks from a
remote client). You could also stream into a single record using
streamed BLOB/CLOB methods, but the xml-reader approach allows you
more flexibility to apply a schema and manipulate the data at the same
time.

Just sharing for info... I have posted the hybrid reader previously
(this forum) if anybody wants it.

(*=obviously the real number was significantly larger!)

Marc

sloan · Nov 20, 2007

Yeah, I kinda do a similar thing.

I use this approach when I have business rules for the imported data.

I use an IDataReader on the source data.
If a row passes the business rule test, then I add it to a (strong) dataset.

Every X number of rows (that pass the test), I send the ds.GetXml() to the
usp with the OPENXML method.

I then get a new (strong) ds, and start over until the IDataReader is done
reading.

I usually set my number to 1000, but I have this configurable.

The key is the business rules. This allows me to write DotNet code to
verify a row, before shipping it to the the (strong) ds, which of course
cuts it off from getting to the db altogether.

So its kind of the same concept, but not exactly. But both are memory
footprint aware .........

Sometimes I log the bad rows....sometimes I don't care about them.

XML to database?

Gary

Guest

Nicholas Paldino [.NET/C# MVP]

sloan

sloan

Marc Gravell

Marc Gravell

sloan

Marc Gravell

sloan