XMLDocument.Load throws random exception on large doc loads

D

David

Here is a puzzle for you all:
(Framework 1.1)

I have a very large xml document that I try to load into
the XMLDocument class using the load method passing a
url. With exactly the same data, this randomly throws an
XMLException: Cannot have a DTD declaration outside of a
DTD. It does not always report the same line either. If
I make the xml doc smaller, the exception is never
thrown. I have split up the data into smaller sections
and all the small sections load--so the data itself is
fine.

I can work around the problem by loading the data into a
string via system.net.HttpWebRequest and then using
XMLDocument.LoadXML rather than just load and this always
works.

So.... What's the deal? Is there a bug with the 1.1
framework XMLDocument.Load method?

Here is a snippet of the code details if someone wishes,
but there really isn't much to it to cause the problem:

Dim oXML As Xml.XmlDocument

Try
oXML.Load(strXMLUrl)
Catch ex As Exception
'an exception is thrown at least 1 out of every 10
'or so times...
End Try
 
J

Jon Skeet [C# MVP]

So.... What's the deal? Is there a bug with the 1.1
framework XMLDocument.Load method?

Here is a snippet of the code details if someone wishes,
but there really isn't much to it to cause the problem:

Dim oXML As Xml.XmlDocument

Try
oXML.Load(strXMLUrl)
Catch ex As Exception
'an exception is thrown at least 1 out of every 10
'or so times...
End Try

How sure are you that strXMLUrl is actually correct? How are you
creating it, exactly?
 
D

David

I created a test console app to clarify the problem. I
have tried the url hard coded and the console app allows
it to be passed in as a command line arg. I am 100% sure
the url is correct--it is not programmtically generated.

Here is the code for my test app and the test script.
<LONG> NOTE: THE URL PROVIDED IN THE TEST MAY BLOCK
YOUR IP AFTER MANY REPEATED REQUESTS SO THE SCRIPT MAY
FAIL EVENTUALLY FOR YOU EVERY TIME (prevents data
scraping)

CODE
=====
Imports System.IO
Imports System.Net
Module Module1
Sub Main(ByVal args() As String)
Dim oXML As Xml.XmlDocument
Dim oXML2 As New Xml.XmlDocument
Dim iTry As Integer
Dim iErrCount As Integer = 0

For iTry = 1 To CInt(args(0))
Try
Console.WriteLine("Try" & iTry)
oXML2.Load(args(1))
Catch ex As Exception
iErrCount += 1
Console.WriteLine(" --->Load Error:" &
iTry & " " & ex.Message)
Console.WriteLine("")
End Try
Next
Console.WriteLine(iErrCount & " failures in " &
args(0) & " tries.")
End Sub
End Module





TEST SCRIPT
===========
Exe Syntax: xmldocloader.exe tries url
Example: xmldocloader.exe
10 "http://pview.findlaw.com/cmd/multiview?
wld_ids=1079637_1|2097283_1|2176725_1"

Run exe with the test data url parameters. The number
listed is the number of data elements in the wld_ids
query parameter in the url
Following the number is the number of failures per 50
interations of the XMLDocument.Load call resulting in the
exception: "This is a DTD declaration outside of a DTD"

It appears as a general rule that the more items in the
querystring (resulting in a larger xml doc)
cause more frequent exceptions per 50 iterations.

The line number and position number reported with the
exception (when looking at the xml doc that loads
successfully)
always begins with <! but is a comment (<!--Some comment--
) or the beginning of a
CDATA section (<![CDATA[Betty is a member in the Atlanta
office.]]>)

If the data from a successful load is saved to a local
file and then the Load is done with a file path
rather than a url, it NEVER failes.

If the xmldoc is loaded with LoadXML() from a string, it
NEVER fails.

============================
TEST DATA
============================
20 - Failed 2 out of 50 times
"http://pview.findlaw.com/cmd/multiview?
wld_ids=1079637_1|2097283_1|2176725_1|1980000_1|3302931_1|
2275687_1|2522288_1|3083156_1|2279751_1|3083164_1|1121614_
1|3232986_1|2886699_1|1728981_1|1714277_1|2205374_1|323372
4_1|1795695_1|1425295_1|3239657_1"

30 - Failed 2 out of 50 times
"http://pview.findlaw.com/cmd/multiview?
wld_ids=1079637_1|2097283_1|2176725_1|1980000_1|3302931_1|
2275687_1|2522288_1|3083156_1|2279751_1|3083164_1|1121614_
1|3232986_1|2886699_1|1728981_1|1714277_1|2205374_1|323372
4_1|1795695_1|1425295_1|3239657_1|3176944_1|1794349_1|2912
631_1|3083400_1|3233428_1|1005301_1|3083408_1|1705662_1|16
70735_1|3182975_1"

40 - Failed 2 out of 50 times
"http://pview.findlaw.com/cmd/multiview?
wld_ids=1079637_1|2097283_1|2176725_1|1980000_1|3302931_1|
2275687_1|2522288_1|3083156_1|2279751_1|3083164_1|1121614_
1|3232986_1|2886699_1|1728981_1|1714277_1|2205374_1|323372
4_1|1795695_1|1425295_1|3239657_1|3176944_1|1794349_1|2912
631_1|3083400_1|3233428_1|1005301_1|3083408_1|1705662_1|16
70735_1|3182975_1|2586826_1|1520406_1|1855311_1|1241114_1|
2081508_1|2769025_1|2148110_1|2363475_1|2767493_1|3302932_
1"

50 - Failed 1 out of 50 times
"http://pview.findlaw.com/cmd/multiview?
wld_ids=1079637_1|2097283_1|2176725_1|1980000_1|3302931_1|
2275687_1|2522288_1|3083156_1|2279751_1|3083164_1|1121614_
1|3232986_1|2886699_1|1728981_1|1714277_1|2205374_1|323372
4_1|1795695_1|1425295_1|3239657_1|3176944_1|1794349_1|2912
631_1|3083400_1|3233428_1|1005301_1|3083408_1|1705662_1|16
70735_1|3182975_1|2586826_1|1520406_1|1855311_1|1241114_1|
2081508_1|2769025_1|2148110_1|2363475_1|2767493_1|3302932_
1|2282789_1|2574204_1|3232520_1|1091792_1|3176984_1|330293
3_1|2397693_1|1600569_1|1102762_1|1158034_1"

60 - Failed 5 out of 50 times
"http://pview.findlaw.com/cmd/multiview?
wld_ids=1079637_1|2097283_1|2176725_1|1980000_1|3302931_1|
2275687_1|2522288_1|3083156_1|2279751_1|3083164_1|1121614_
1|3232986_1|2886699_1|1728981_1|1714277_1|2205374_1|323372
4_1|1795695_1|1425295_1|3239657_1|3176944_1|1794349_1|2912
631_1|3083400_1|3233428_1|1005301_1|3083408_1|1705662_1|16
70735_1|3182975_1|2586826_1|1520406_1|1855311_1|1241114_1|
2081508_1|2769025_1|2148110_1|2363475_1|2767493_1|3302932_
1|2282789_1|2574204_1|3232520_1|1091792_1|3176984_1|330293
3_1|2397693_1|1600569_1|1102762_1|1158034_1|3306985_1|3302
934_1|1077372_1|3177004_1|1714295_1|1204955_1|1950543_1|15
56446_1|3070770_1|1074991_1"

70 - Failed 2 out of 50 times
"http://pview.findlaw.com/cmd/multiview?
wld_ids=1079637_1|2097283_1|2176725_1|1980000_1|3302931_1|
2275687_1|2522288_1|3083156_1|2279751_1|3083164_1|1121614_
1|3232986_1|2886699_1|1728981_1|1714277_1|2205374_1|323372
4_1|1795695_1|1425295_1|3239657_1|3176944_1|1794349_1|2912
631_1|3083400_1|3233428_1|1005301_1|3083408_1|1705662_1|16
70735_1|3182975_1|2586826_1|1520406_1|1855311_1|1241114_1|
2081508_1|2769025_1|2148110_1|2363475_1|2767493_1|3302932_
1|2282789_1|2574204_1|3232520_1|1091792_1|3176984_1|330293
3_1|2397693_1|1600569_1|1102762_1|1158034_1|3306985_1|3302
934_1|1077372_1|3177004_1|1714295_1|1204955_1|1950543_1|15
56446_1|3070770_1|1074991_1|1882735_1|3233007_1|3308649_1|
1714293_1|3177032_1|3083486_1|1122847_1|3233785_1|2365842_
1|1585983_1"

80 - Failed 2 out of 50 times
"http://pview.findlaw.com/cmd/multiview?
wld_ids=1079637_1|2097283_1|2176725_1|1980000_1|3302931_1|
2275687_1|2522288_1|3083156_1|2279751_1|3083164_1|1121614_
1|3232986_1|2886699_1|1728981_1|1714277_1|2205374_1|323372
4_1|1795695_1|1425295_1|3239657_1|3176944_1|1794349_1|2912
631_1|3083400_1|3233428_1|1005301_1|3083408_1|1705662_1|16
70735_1|3182975_1|2586826_1|1520406_1|1855311_1|1241114_1|
2081508_1|2769025_1|2148110_1|2363475_1|2767493_1|3302932_
1|2282789_1|2574204_1|3232520_1|1091792_1|3176984_1|330293
3_1|2397693_1|1600569_1|1102762_1|1158034_1|3306985_1|3302
934_1|1077372_1|3177004_1|1714295_1|1204955_1|1950543_1|15
56446_1|3070770_1|1074991_1|1882735_1|3233007_1|3308649_1|
1714293_1|3177032_1|3083486_1|1122847_1|3233785_1|2365842_
1|1585983_1|1857537_1|1158026_1|1901154_1|1034452_1|192003
9_1|1743492_1|3233789_1|3083529_1|1350058_1|2303604_1"

90 - Failed 15 out of 50 times
"http://pview.findlaw.com/cmd/multiview?
wld_ids=1079637_1|2097283_1|2176725_1|1980000_1|3302931_1|
2275687_1|2522288_1|3083156_1|2279751_1|3083164_1|1121614_
1|3232986_1|2886699_1|1728981_1|1714277_1|2205374_1|323372
4_1|1795695_1|1425295_1|3239657_1|3176944_1|1794349_1|2912
631_1|3083400_1|3233428_1|1005301_1|3083408_1|1705662_1|16
70735_1|3182975_1|2586826_1|1520406_1|1855311_1|1241114_1|
2081508_1|2769025_1|2148110_1|2363475_1|2767493_1|3302932_
1|2282789_1|2574204_1|3232520_1|1091792_1|3176984_1|330293
3_1|2397693_1|1600569_1|1102762_1|1158034_1|3306985_1|3302
934_1|1077372_1|3177004_1|1714295_1|1204955_1|1950543_1|15
56446_1|3070770_1|1074991_1|1882735_1|3233007_1|3308649_1|
1714293_1|3177032_1|3083486_1|1122847_1|3233785_1|2365842_
1|1585983_1|1857537_1|1158026_1|1901154_1|1034452_1|192003
9_1|1743492_1|3233789_1|3083529_1|1350058_1|2303604_1|3177
618_1|1090165_1|3177052_1|3302935_1|3250431_1|2433475_1|25
22654_1|3302936_1|1102909_1|2287346_1"

100 - Failed 22 out of 50 times
"http://pview.findlaw.com/cmd/multiview?
wld_ids=1079637_1|2097283_1|2176725_1|1980000_1|3302931_1|
2275687_1|2522288_1|3083156_1|2279751_1|3083164_1|1121614_
1|3232986_1|2886699_1|1728981_1|1714277_1|2205374_1|323372
4_1|1795695_1|1425295_1|3239657_1|3176944_1|1794349_1|2912
631_1|3083400_1|3233428_1|1005301_1|3083408_1|1705662_1|16
70735_1|3182975_1|2586826_1|1520406_1|1855311_1|1241114_1|
2081508_1|2769025_1|2148110_1|2363475_1|2767493_1|3302932_
1|2282789_1|2574204_1|3232520_1|1091792_1|3176984_1|330293
3_1|2397693_1|1600569_1|1102762_1|1158034_1|3306985_1|3302
934_1|1077372_1|3177004_1|1714295_1|1204955_1|1950543_1|15
56446_1|3070770_1|1074991_1|1882735_1|3233007_1|3308649_1|
1714293_1|3177032_1|3083486_1|1122847_1|3233785_1|2365842_
1|1585983_1|1857537_1|1158026_1|1901154_1|1034452_1|192003
9_1|1743492_1|3233789_1|3083529_1|1350058_1|2303604_1|3177
618_1|1090165_1|3177052_1|3302935_1|3250431_1|2433475_1|25
22654_1|3302936_1|1102909_1|2287346_1|1619609_1|1670696_1|
1102960_1|3234523_1|1115374_1|2741855_1|1111152_1|1102827_
1|3233581_1|3182966_1"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top