Intrepreting Javascript within C#

Logician · Sep 18, 2007

Does anyone have any idea how this is done?

I am writing a c# bot to grab data from sites, but some sites have
extensive Javascript navigation. This means I have to read the script
and effectively run it within c#.
I have one example from a book (Http programming for bots using c#).
The problem I have is understanding how to setup the package and then
how to process the Javascript code on the site without actually in
effect copying the code.

JScript.Eval E = new JScript.Eval();
String expression = TextBox.Text;
try
{
TextBox1.Text = E.DoEval(expression);
}

JScript is defined as:

package JScript
{
class Eval
{
public function DoEval(expr:String):String
{
return eval(expr);
}
}
}

Nicholas Paldino [.NET/C# MVP] · Sep 18, 2007

Logician,

If this is in the context of an HTML page, then why not use an HTML
document host like MSHTML to execute the javascript? You can then access
the document object model (DOM) after the javascript is executed, as well as
set values on the page and see how the page reacts.

Ignacio Machin \( .NET/ C# MVP \) · Sep 18, 2007

Hi,

Logician said:
Does anyone have any idea how this is done?

I am writing a c# bot to grab data from sites, but some sites have
extensive Javascript navigation. This means I have to read the script
and effectively run it within c#.

I do not htink that it will solve your problem. I think that the best way
you can do this is by yusing a WebBrowser control and load the page inside
the control. Then you can parse the document and get the data you need.

In today's dynamic web pages parsing the DOM is the only way to know for
sure.

Logician · Sep 18, 2007

Logician,

If this is in the context of an HTML page, then why not use an HTML
document host like MSHTML to execute the javascript? You can then access
the document object model (DOM) after the javascript is executed, as well as
set values on the page and see how the page reacts.

I'm not sure how that works. It is an HTML page but generated from
user input, eg a menu click. The bot has to read the js code and then
intrepret it to get the new url, then effectively visit the url. But
as the url is built at run time this is fairly hard.

I checked out some sites and saw, eg grohe.co.uk, that not even google
has indexed much of those sites.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Does anyone have any idea how this is done?

Click to expand...

I am writing a c# bot to grab data from sites, but some sites have
extensive Javascript navigation. This means I have to read the script
and effectively run it within c#.
I have one example from a book (Http programming for bots using c#).
The problem I have is understanding how to setup the package and then
how to process the Javascript code on the site without actually in
effect copying the code.

Click to expand...

JScript.Eval E = new JScript.Eval();
String expression = TextBox.Text;
try
{
TextBox1.Text = E.DoEval(expression);
}

Click to expand...

JScript is defined as:

Click to expand...

package JScript
{
class Eval
{
public function DoEval(expr:String):String
{
return eval(expr);
}
}
}- Hide quoted text -

Click to expand...

- Show quoted text -

Logician · Sep 18, 2007

Hi,

I do not htink that it will solve your problem. I think that the best way
you can do this is by yusing a WebBrowser control and load the page inside
the control. Then you can parse the document and get the data you need.

In today's dynamic web pages parsing the DOM is the only way to know for
sure.

I will think about it. The problem with that method is that the pages
on some sites, eg grohe.co.uk are generated at run time based on user
input. So the bot has to simulate user input and then get the results
from the javascript referenced on the pages. Those sites have just a
few static pages, and the rest is all built from Javascript.

Logician · Sep 18, 2007

Hi,

I do not htink that it will solve your problem. I think that the best way
you can do this is by yusing a WebBrowser control and load the page inside
the control. Then you can parse the document and get the data you need.

In today's dynamic web pages parsing the DOM is the only way to know for
sure.

BTW these js based sites do not work with Javascript turned off, so
the actual web design is bad.

Nicholas Paldino [.NET/C# MVP] · Sep 18, 2007

Logician,

Right, and using MSHTML, you would load the page, and then your bot
would manipulate the DOM to set the appropriate input. Once you do that,
the HTML engine will interpret the javascript and then you can access the
DOM to get whatever elements were changed to have the new URL.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Logician said:
Logician,

If this is in the context of an HTML page, then why not use an HTML
document host like MSHTML to execute the javascript? You can then access
the document object model (DOM) after the javascript is executed, as well
as
set values on the page and see how the page reacts.

Click to expand...

I'm not sure how that works. It is an HTML page but generated from
user input, eg a menu click. The bot has to read the js code and then
intrepret it to get the new url, then effectively visit the url. But
as the url is built at run time this is fairly hard.

I checked out some sites and saw, eg grohe.co.uk, that not even google
has indexed much of those sites.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Does anyone have any idea how this is done?

Click to expand...

I am writing a c# bot to grab data from sites, but some sites have
extensive Javascript navigation. This means I have to read the script
and effectively run it within c#.
I have one example from a book (Http programming for bots using c#).
The problem I have is understanding how to setup the package and then
how to process the Javascript code on the site without actually in
effect copying the code.

Click to expand...

JScript.Eval E = new JScript.Eval();
String expression = TextBox.Text;
try
{
TextBox1.Text = E.DoEval(expression);
}

Click to expand...

JScript is defined as:

Click to expand...

package JScript
{
class Eval
{
public function DoEval(expr:String):String
{
return eval(expr);
}
}
}- Hide quoted text -

Click to expand...

- Show quoted text -

Click to expand...

Logician · Sep 18, 2007

Logician,

Right, and using MSHTML, you would load the page, and then your bot
would manipulate the DOM to set the appropriate input. Once you do that,
the HTML engine will interpret the javascript and then you can access the
DOM to get whatever elements were changed to have the new URL.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

I'm not sure how that works. It is an HTML page but generated from
user input, eg a menu click. The bot has to read the js code and then
intrepret it to get the new url, then effectively visit the url. But
as the url is built at run time this is fairly hard.

Click to expand...

I checked out some sites and saw, eg grohe.co.uk, that not even google
has indexed much of those sites.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Does anyone have any idea how this is done?
I am writing a c# bot to grab data from sites, but some sites have
extensive Javascript navigation. This means I have to read the script
and effectively run it within c#.
I have one example from a book (Http programming for bots using c#).
The problem I have is understanding how to setup the package and then
how to process the Javascript code on the site without actually in
effect copying the code.
JScript.Eval E = new JScript.Eval();
String expression = TextBox.Text;
try
{
TextBox1.Text = E.DoEval(expression);
}
JScript is defined as:
package JScript
{
class Eval
{
public function DoEval(expr:String):String
{
return eval(expr);
}
}
}- Hide quoted text -
- Show quoted text -- Hide quoted text -

Click to expand...

Click to expand...

- Show quoted text -

I will think about it.

Many of the sites do not have URL's as such, they are effectively
opening JS windows on the client and then using data acess to built up
data. So the bot needs to simulate the user input and then read the
new screen. This is different to finding a new URL which assumes the
data is stored as HTML (or other markup) somewhere statically and
needs to be accessed. The data is stored in a database only, and
everything is generated.

Ignacio Machin \( .NET/ C# MVP \) · Sep 18, 2007

Hi,

Logician said:
BTW these js based sites do not work with Javascript turned off, so
the actual web design is bad.

Well, realize that any site done with AJAX needs javascript. IMHO the days
of a browsing experience without JS support are numbered.

Nicholas Paldino [.NET/C# MVP] · Sep 18, 2007

That's the thing, the javascript has to store the URL somewhere, and
that is usually an element in the HTML (even if it is dynamic). Even if it
is stored in some array in memory and then accessed in a click handler or
something like that, you will have to execute the javascript in the page and
then access the script parser in order to get those values, which MSHTML
will allow you to do.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Logician said:
Logician,

Right, and using MSHTML, you would load the page, and then your bot
would manipulate the DOM to set the appropriate input. Once you do that,
the HTML engine will interpret the javascript and then you can access the
DOM to get whatever elements were changed to have the new URL.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

On Sep 18, 4:00 pm, "Nicholas Paldino [.NET/C# MVP]"
Logician,

Click to expand...

If this is in the context of an HTML page, then why not use an
HTML
document host like MSHTML to execute the javascript? You can then
access
the document object model (DOM) after the javascript is executed, as
well
as
set values on the page and see how the page reacts.
I'm not sure how that works. It is an HTML page but generated from
user input, eg a menu click. The bot has to read the js code and then
intrepret it to get the new url, then effectively visit the url. But
as the url is built at run time this is fairly hard.

Click to expand...

I checked out some sites and saw, eg grohe.co.uk, that not even google
has indexed much of those sites.

Click to expand...

Does anyone have any idea how this is done?

Click to expand...

I am writing a c# bot to grab data from sites, but some sites have
extensive Javascript navigation. This means I have to read the
script
and effectively run it within c#.
I have one example from a book (Http programming for bots using c#).
The problem I have is understanding how to setup the package and
then
how to process the Javascript code on the site without actually in
effect copying the code.

Click to expand...

JScript.Eval E = new JScript.Eval();
String expression = TextBox.Text;
try
{
TextBox1.Text = E.DoEval(expression);
}

Click to expand...

JScript is defined as:

Click to expand...

package JScript
{
class Eval
{
public function DoEval(expr:String):String
{
return eval(expr);
}
}
}- Hide quoted text -

Click to expand...

- Show quoted text -- Hide quoted text -

Click to expand...

- Show quoted text -

Click to expand...

I will think about it.

Many of the sites do not have URL's as such, they are effectively
opening JS windows on the client and then using data acess to built up
data. So the bot needs to simulate the user input and then read the
new screen. This is different to finding a new URL which assumes the
data is stored as HTML (or other markup) somewhere statically and
needs to be accessed. The data is stored in a database only, and
everything is generated.

Logician · Sep 18, 2007

Hi,

Well, realize that any site done with AJAX needs javascript. IMHO the days
of a browsing experience without JS support are numbered.- Hide quoted text -

- Show quoted text -

I know that a lot of corporate users have firewalls that stop any
client scripts running on local computers. This in effect stops them
using javascript based sites. This is done mainly for security reasons
as the computers are networked. So that presents one issue for the
javascript advocates.

Logician · Sep 18, 2007

That's the thing, the javascript has to store the URL somewhere, and
that is usually an element in the HTML (even if it is dynamic). Even if it
is stored in some array in memory and then accessed in a click handler or
something like that, you will have to execute the javascript in the page and
then access the script parser in order to get those values, which MSHTML
will allow you to do.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Logician,
Right, and using MSHTML, you would load the page, and then your bot
would manipulate the DOM to set the appropriate input. Once you do that,
the HTML engine will interpret the javascript and then you can access the
DOM to get whatever elements were changed to have the new URL.
--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

On Sep 18, 4:00 pm, "Nicholas Paldino [.NET/C# MVP]"
Logician,
If this is in the context of an HTML page, then why not use an
HTML
document host like MSHTML to execute the javascript? You can then
access
the document object model (DOM) after the javascript is executed, as
well
as
set values on the page and see how the page reacts.
I'm not sure how that works. It is an HTML page but generated from
user input, eg a menu click. The bot has to read the js code and then
intrepret it to get the new url, then effectively visit the url. But
as the url is built at run time this is fairly hard.
I checked out some sites and saw, eg grohe.co.uk, that not even google
has indexed much of those sites.
--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Does anyone have any idea how this is done?
I am writing a c# bot to grab data from sites, but some sites have
extensive Javascript navigation. This means I have to read the
script
and effectively run it within c#.
I have one example from a book (Http programming for bots using c#).
The problem I have is understanding how to setup the package and
then
how to process the Javascript code on the site without actually in
effect copying the code.
JScript.Eval E = new JScript.Eval();
String expression = TextBox.Text;
try
{
TextBox1.Text = E.DoEval(expression);
}
JScript is defined as:
package JScript
{
class Eval
{
public function DoEval(expr:String):String
{
return eval(expr);
}
}
}- Hide quoted text -
- Show quoted text -- Hide quoted text -
- Show quoted text -

Click to expand...

Click to expand...

I will think about it.

Click to expand...

Many of the sites do not have URL's as such, they are effectively
opening JS windows on the client and then using data acess to built up
data. So the bot needs to simulate the user input and then read the
new screen. This is different to finding a new URL which assumes the
data is stored as HTML (or other markup) somewhere statically and
needs to be accessed. The data is stored in a database only, and
everything is generated.- Hide quoted text -

Click to expand...

- Show quoted text -

I'll think about this.
Look at http://www.grohe.co.uk/t/25_731.html for an example of a page
that does little before opening a js window.

Key code for example: <a href="#" onclick="advanced_matrix('open',
'6027693');">

My bot have to understand advanced_matrix and then produce the new
page.

Cor Ligthert[MVP] · Sep 19, 2007

Logican,

Are you common with the Microsoft Script Control. I have a very simple
sample in VB.Net (not in C#) made by Charles Law. However I saw that here
is as well a lot written about it.

http://support.microsoft.com/kb/184740

Cor

Intrepreting Javascript within C#

Logician

Nicholas Paldino [.NET/C# MVP]

Ignacio Machin \( .NET/ C# MVP \)

Logician

Logician

Logician

Nicholas Paldino [.NET/C# MVP]

Logician

Ignacio Machin \( .NET/ C# MVP \)

Nicholas Paldino [.NET/C# MVP]

Logician

Logician

Cor Ligthert[MVP]