MSHTML asp.net web application SLOW

  • Thread starter Thread starter DotNetShadow
  • Start date Start date
D

DotNetShadow

Hi Guys,

I have been reading heaps of information in regards to MSHTML object
being used in .NET applications, windows and UI-less. I have read the
walkall samples and tried various techniques with MSHTML. My biggest
problem seems to be running MSHTML in asp.net application and looping
elements in the document.

Example:

a) Grab site (http://www.amazon.com) into MSHTML htmldocument
b) Loop the element collection ihtmlelementCollection (~1300 elements
for amazon)
c) Accessing properties for each of these nodes such as tagname,
currentstyle.fontsize etc

I have found the following problems:

1) When using Ctype to cast to appropiate types its slower than the
late binding method such as collection(i).tagname

2) Looping 1300 elements takes 5 - 8 seconds ... in the equivalent
VB.NET windows application same code, takes only 0.5 seconds.

The following is the code I use as a sample aspx page:

' ========[ 3 - 4 seconds ]
========================================
Dim htmldocument as New mshtml.htmldocument

Dim doc As mshtml.IHTMLDocument2 =
CType(htmldocument.createDocumentFromUrl(strString, ""),
mshtml.IHTMLDocument2)

Dim timeout As DateTime = Now

Do Until htmldocument.readyState = "complete" Or Now.Subtract
(timeout).TotalSeconds >= 4
System.Threading.Thread.CurrentThread.Sleep(100)
Loop


' ========[ 5 - 8 seconds ]
========================================
Dim mycollection As mshtml.IHTMLElementCollection =
pageDoc.body.all
Dim colLength As Int32 = mycollection.length

Dim result As New StringBuilder("")

Dim a As String
Dim b As String
Dim c As String
Dim d As String
Dim e As String
Dim myInnerText As String

For i As Integer = 0 To colLength - 1

a = mycollection.item(i).tagName
b = mycollection.item(i).currentStyle.fontfamily
c = mycollection.item(i).currentStyle.fontSize
d = mycollection.item(i).currentStyle.fontWeight
e = mycollection.item(i).currentStyle.fontStyle

' Get TAG node

myInnerText = CStr(mycollection.item(i).innerHTML)

result.Append(a)
result.Append(" - ")
result.Append(b)
result.Append(" - ")
result.Append(c)
result.Append(" - ")
result.Append(" - ")
result.Append(d)
result.Append(" - ")
result.Append(e)

Next

Any help would be greatly appreciated on how to solve this annoying
problem to speed it up, I have read marshalling is what could be
killing me but what can I do about it?

Regards DotNetShadow
 
Hi,

Just an idea. If using MSHTML is faster from winforms then whebforms try
to take it out from webforms by using remoting or threading.

Natty Gur[MVP]

blog : http://weblogs.asp.net/ngur
Mobile: +972-(0)58-888377
 
Hi Natty,

The major problem I found was solved by making the asp.net call a thread
that was STA since MSHTML is a single threaded component.... so ur
suggestion was partially right... although I did encounter this strange
problem

Dim doc As mshtml.IHTMLDocument2 =
CType(htmldocument.createDocumentFromUrl(strString, ""),
mshtml.IHTMLDocument2)

seems to fail when in STA model... so I had to use the webrequest
object and write the data into the document object such as:

Dim client As New WebClient

' Add a user agent header in case the
' requested URI contains a query.
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible;
MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)")

Dim data As Stream =
client.OpenRead("http://www.microsoft.com")
Dim reader As New StreamReader(data)
Dim html As String = reader.ReadToEnd()
data.Close()
reader.Close()

Dim doc As New mshtml.HTMLDocument
Dim doc2 As mshtml.IHTMLDocument2 = doc
doc2.open()
doc2.write(html)
doc2.close()

Is there any reason why doing it the original way
htmldocument.createDocumentFromUrl
would cause a NullException error or even have a an empty document
with url as about:blank?

Regards DotNetShadow
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top