XML to database?

G

Gary

I have a requirement to read element values (can be in thousands) from an
xml file, and eventually write them all the SQL Server database.
I was thinking we could first write all the values from xml to some
datastructure (arraylist etc.) and then write from that datastructure to the
DB with a single database call.

Which would be the best possible datastructure to achieve this? Any
pointers/suggestions are appreciated.
 
N

Nicholas Paldino [.NET/C# MVP]

Gary,

I would let SQL Server do it. You can pass the XML to a stored
procedure and use the xp_xml_preparedocument stored procedure, along with
the OPENXML function to insert into your table. For more information, check
out the section of SQL Server 2005 Books Online titled "Using FOR XML and
OPENXML to Publish and Process XML Data", located at:

http://technet.microsoft.com/en-us/library/ms191268.aspx
 
S

sloan

That will definately work (peter's idea).

The caveat here is that... it will end up being a "row by row" update....

The OPENXML allows a 1 db hit, bulk (set based) approach (see my other
post).

Do the development goals need to be understood.

Does the OP want fastest way to implement, or best performing? (of which I
don't know the answer).

...

One of the key things about the setbased approach over the row-by-row is
that index rebuilding may be expensive.
The set-based approach allows index to re-update after one big set-based
insert or update, rather than after every row-by-row update.


The OP does mention "in the thousands", thus a set-based/bulk insert would
be the way I would go here.
You can throttle the number of records that go into the dataset as well, say
....every 1000 records in the dataset, gets sent to the db.

I've handled 4 MB xml files with no issues using the OPENXML method, I have
not tested the upper limits, so don't know beyond that.

But most of my stuff is under 1/3 meg ....which works well with OPENXML
IMHO.


You can also throw in some BEGIN TRAN type stuff into the usp (stored
procedure) so you get a very nice "all or nothing" ability.
 
M

Marc Gravell

I think this is a good approach (I have used similar in a bulk-data,
semi-structure scenario) - but just to highlight that SQL Server 2005
(and above) has the "xml" data-type, which allows you to handle xml
(shred, query, build, update, index, etc) /much/ more efficiently than
OPENXML can. Of course, if it is just storage/retrieval then go
"varbinary(max)" and have done with it (varbinary avoids codepage
issues associated with [n]varchar).

As an example, you can pass a single string over to an SP, cast it to
xml in the database and tear it apart with XPath; minimises round
trips.

Marc
 
M

Marc Gravell

I think this is a good approach
(sorry, I was responding to Nicholas's post re xp_xml_preparedocument;
not sure if it will appear that way in all clients ;-p)
 
S

sloan

Yeah, my suggestion (another post) will work with either 2000 or 2005.

If the OP gives the version of Sql Server, that would present a few more
options (to which you are illuding).
 
M

Marc Gravell

I've handled 4 MB xml files with no issues using the OPENXML method, I have
no tested the upper limits, so don't know beyond that.

Just for info; for big imports, I have successfully used a hybrid
XmlReader / IDataReader to present the first-level elements (which are
frequently the most common) as successive rows in a data-reader - i.e.

<xml><row>...</row><row>...</row><row>...</row></xml>

would get parsed as 3* rows in an IDataReader, but it works while /
streaming/ rather than having to load it all into memory first. In
particular you can pass your spoof IDataReader directly to an
SqlBulkCopy to pump it (again, streaming) into a staging table (with a
key to identify the batch), before finally invoking an SP to commit
the whole lot into the real tables. This allows you to limit any
transaction to just the final merge, avoiding long locks during the
import process (especially important as I was getting chunks from a
remote client). You could also stream into a single record using
streamed BLOB/CLOB methods, but the xml-reader approach allows you
more flexibility to apply a schema and manipulate the data at the same
time.

Just sharing for info... I have posted the hybrid reader previously
(this forum) if anybody wants it.

(*=obviously the real number was significantly larger!)

Marc
 
S

sloan

Yeah, I kinda do a similar thing.

I use this approach when I have business rules for the imported data.

I use an IDataReader on the source data.
If a row passes the business rule test, then I add it to a (strong) dataset.

Every X number of rows (that pass the test), I send the ds.GetXml() to the
usp with the OPENXML method.

I then get a new (strong) ds, and start over until the IDataReader is done
reading.

I usually set my number to 1000, but I have this configurable.

The key is the business rules. This allows me to write DotNet code to
verify a row, before shipping it to the the (strong) ds, which of course
cuts it off from getting to the db altogether.

So its kind of the same concept, but not exactly. But both are memory
footprint aware .........

Sometimes I log the bad rows....sometimes I don't care about them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top