Intrepreting Javascript within C#

  • Thread starter Thread starter Logician
  • Start date Start date
L

Logician

Does anyone have any idea how this is done?

I am writing a c# bot to grab data from sites, but some sites have
extensive Javascript navigation. This means I have to read the script
and effectively run it within c#.
I have one example from a book (Http programming for bots using c#).
The problem I have is understanding how to setup the package and then
how to process the Javascript code on the site without actually in
effect copying the code.

JScript.Eval E = new JScript.Eval();
String expression = TextBox.Text;
try
{
TextBox1.Text = E.DoEval(expression);
}

JScript is defined as:

package JScript
{
class Eval
{
public function DoEval(expr:String):String
{
return eval(expr);
}
}
}
 
Logician,

If this is in the context of an HTML page, then why not use an HTML
document host like MSHTML to execute the javascript? You can then access
the document object model (DOM) after the javascript is executed, as well as
set values on the page and see how the page reacts.
 
Hi,

Logician said:
Does anyone have any idea how this is done?

I am writing a c# bot to grab data from sites, but some sites have
extensive Javascript navigation. This means I have to read the script
and effectively run it within c#.

I do not htink that it will solve your problem. I think that the best way
you can do this is by yusing a WebBrowser control and load the page inside
the control. Then you can parse the document and get the data you need.

In today's dynamic web pages parsing the DOM is the only way to know for
sure.
 
Logician,

If this is in the context of an HTML page, then why not use an HTML
document host like MSHTML to execute the javascript? You can then access
the document object model (DOM) after the javascript is executed, as well as
set values on the page and see how the page reacts.
I'm not sure how that works. It is an HTML page but generated from
user input, eg a menu click. The bot has to read the js code and then
intrepret it to get the new url, then effectively visit the url. But
as the url is built at run time this is fairly hard.

I checked out some sites and saw, eg grohe.co.uk, that not even google
has indexed much of those sites.
--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)




Does anyone have any idea how this is done?
I am writing a c# bot to grab data from sites, but some sites have
extensive Javascript navigation. This means I have to read the script
and effectively run it within c#.
I have one example from a book (Http programming for bots using c#).
The problem I have is understanding how to setup the package and then
how to process the Javascript code on the site without actually in
effect copying the code.
JScript.Eval E = new JScript.Eval();
String expression = TextBox.Text;
try
{
TextBox1.Text = E.DoEval(expression);
}
JScript is defined as:
package JScript
{
class Eval
{
public function DoEval(expr:String):String
{
return eval(expr);
}
}
}- Hide quoted text -

- Show quoted text -
 
Hi,





I do not htink that it will solve your problem. I think that the best way
you can do this is by yusing a WebBrowser control and load the page inside
the control. Then you can parse the document and get the data you need.

In today's dynamic web pages parsing the DOM is the only way to know for
sure.

I will think about it. The problem with that method is that the pages
on some sites, eg grohe.co.uk are generated at run time based on user
input. So the bot has to simulate user input and then get the results
from the javascript referenced on the pages. Those sites have just a
few static pages, and the rest is all built from Javascript.
 
Hi,





I do not htink that it will solve your problem. I think that the best way
you can do this is by yusing a WebBrowser control and load the page inside
the control. Then you can parse the document and get the data you need.

In today's dynamic web pages parsing the DOM is the only way to know for
sure.

BTW these js based sites do not work with Javascript turned off, so
the actual web design is bad.
 
Logician,

Right, and using MSHTML, you would load the page, and then your bot
would manipulate the DOM to set the appropriate input. Once you do that,
the HTML engine will interpret the javascript and then you can access the
DOM to get whatever elements were changed to have the new URL.


--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Logician said:
Logician,

If this is in the context of an HTML page, then why not use an HTML
document host like MSHTML to execute the javascript? You can then access
the document object model (DOM) after the javascript is executed, as well
as
set values on the page and see how the page reacts.
I'm not sure how that works. It is an HTML page but generated from
user input, eg a menu click. The bot has to read the js code and then
intrepret it to get the new url, then effectively visit the url. But
as the url is built at run time this is fairly hard.

I checked out some sites and saw, eg grohe.co.uk, that not even google
has indexed much of those sites.
--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)




Does anyone have any idea how this is done?
I am writing a c# bot to grab data from sites, but some sites have
extensive Javascript navigation. This means I have to read the script
and effectively run it within c#.
I have one example from a book (Http programming for bots using c#).
The problem I have is understanding how to setup the package and then
how to process the Javascript code on the site without actually in
effect copying the code.
JScript.Eval E = new JScript.Eval();
String expression = TextBox.Text;
try
{
TextBox1.Text = E.DoEval(expression);
}
JScript is defined as:
package JScript
{
class Eval
{
public function DoEval(expr:String):String
{
return eval(expr);
}
}
}- Hide quoted text -

- Show quoted text -
 
Logician,

Right, and using MSHTML, you would load the page, and then your bot
would manipulate the DOM to set the appropriate input. Once you do that,
the HTML engine will interpret the javascript and then you can access the
DOM to get whatever elements were changed to have the new URL.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)




I'm not sure how that works. It is an HTML page but generated from
user input, eg a menu click. The bot has to read the js code and then
intrepret it to get the new url, then effectively visit the url. But
as the url is built at run time this is fairly hard.
I checked out some sites and saw, eg grohe.co.uk, that not even google
has indexed much of those sites.
--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Does anyone have any idea how this is done?
I am writing a c# bot to grab data from sites, but some sites have
extensive Javascript navigation. This means I have to read the script
and effectively run it within c#.
I have one example from a book (Http programming for bots using c#).
The problem I have is understanding how to setup the package and then
how to process the Javascript code on the site without actually in
effect copying the code.
JScript.Eval E = new JScript.Eval();
String expression = TextBox.Text;
try
{
TextBox1.Text = E.DoEval(expression);
}
JScript is defined as:
package JScript
{
class Eval
{
public function DoEval(expr:String):String
{
return eval(expr);
}
}
}- Hide quoted text -
- Show quoted text -- Hide quoted text -

- Show quoted text -

I will think about it.

Many of the sites do not have URL's as such, they are effectively
opening JS windows on the client and then using data acess to built up
data. So the bot needs to simulate the user input and then read the
new screen. This is different to finding a new URL which assumes the
data is stored as HTML (or other markup) somewhere statically and
needs to be accessed. The data is stored in a database only, and
everything is generated.
 
Hi,

Logician said:
BTW these js based sites do not work with Javascript turned off, so
the actual web design is bad.

Well, realize that any site done with AJAX needs javascript. IMHO the days
of a browsing experience without JS support are numbered.
 
That's the thing, the javascript has to store the URL somewhere, and
that is usually an element in the HTML (even if it is dynamic). Even if it
is stored in some array in memory and then accessed in a click handler or
something like that, you will have to execute the javascript in the page and
then access the script parser in order to get those values, which MSHTML
will allow you to do.


--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Logician said:
Logician,

Right, and using MSHTML, you would load the page, and then your bot
would manipulate the DOM to set the appropriate input. Once you do that,
the HTML engine will interpret the javascript and then you can access the
DOM to get whatever elements were changed to have the new URL.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)




On Sep 18, 4:00 pm, "Nicholas Paldino [.NET/C# MVP]"
Logician,
If this is in the context of an HTML page, then why not use an
HTML
document host like MSHTML to execute the javascript? You can then
access
the document object model (DOM) after the javascript is executed, as
well
as
set values on the page and see how the page reacts.
I'm not sure how that works. It is an HTML page but generated from
user input, eg a menu click. The bot has to read the js code and then
intrepret it to get the new url, then effectively visit the url. But
as the url is built at run time this is fairly hard.
I checked out some sites and saw, eg grohe.co.uk, that not even google
has indexed much of those sites.
Does anyone have any idea how this is done?
I am writing a c# bot to grab data from sites, but some sites have
extensive Javascript navigation. This means I have to read the
script
and effectively run it within c#.
I have one example from a book (Http programming for bots using c#).
The problem I have is understanding how to setup the package and
then
how to process the Javascript code on the site without actually in
effect copying the code.
JScript.Eval E = new JScript.Eval();
String expression = TextBox.Text;
try
{
TextBox1.Text = E.DoEval(expression);
}
JScript is defined as:
package JScript
{
class Eval
{
public function DoEval(expr:String):String
{
return eval(expr);
}
}
}- Hide quoted text -
- Show quoted text -- Hide quoted text -

- Show quoted text -

I will think about it.

Many of the sites do not have URL's as such, they are effectively
opening JS windows on the client and then using data acess to built up
data. So the bot needs to simulate the user input and then read the
new screen. This is different to finding a new URL which assumes the
data is stored as HTML (or other markup) somewhere statically and
needs to be accessed. The data is stored in a database only, and
everything is generated.
 
Hi,









Well, realize that any site done with AJAX needs javascript. IMHO the days
of a browsing experience without JS support are numbered.- Hide quoted text -

- Show quoted text -

I know that a lot of corporate users have firewalls that stop any
client scripts running on local computers. This in effect stops them
using javascript based sites. This is done mainly for security reasons
as the computers are networked. So that presents one issue for the
javascript advocates.
 
That's the thing, the javascript has to store the URL somewhere, and
that is usually an element in the HTML (even if it is dynamic). Even if it
is stored in some array in memory and then accessed in a click handler or
something like that, you will have to execute the javascript in the page and
then access the script parser in order to get those values, which MSHTML
will allow you to do.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)




Logician,
Right, and using MSHTML, you would load the page, and then your bot
would manipulate the DOM to set the appropriate input. Once you do that,
the HTML engine will interpret the javascript and then you can access the
DOM to get whatever elements were changed to have the new URL.
--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

On Sep 18, 4:00 pm, "Nicholas Paldino [.NET/C# MVP]"
Logician,
If this is in the context of an HTML page, then why not use an
HTML
document host like MSHTML to execute the javascript? You can then
access
the document object model (DOM) after the javascript is executed, as
well
as
set values on the page and see how the page reacts.
I'm not sure how that works. It is an HTML page but generated from
user input, eg a menu click. The bot has to read the js code and then
intrepret it to get the new url, then effectively visit the url. But
as the url is built at run time this is fairly hard.
I checked out some sites and saw, eg grohe.co.uk, that not even google
has indexed much of those sites.
--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Does anyone have any idea how this is done?
I am writing a c# bot to grab data from sites, but some sites have
extensive Javascript navigation. This means I have to read the
script
and effectively run it within c#.
I have one example from a book (Http programming for bots using c#).
The problem I have is understanding how to setup the package and
then
how to process the Javascript code on the site without actually in
effect copying the code.
JScript.Eval E = new JScript.Eval();
String expression = TextBox.Text;
try
{
TextBox1.Text = E.DoEval(expression);
}
JScript is defined as:
package JScript
{
class Eval
{
public function DoEval(expr:String):String
{
return eval(expr);
}
}
}- Hide quoted text -
- Show quoted text -- Hide quoted text -
- Show quoted text -
I will think about it.
Many of the sites do not have URL's as such, they are effectively
opening JS windows on the client and then using data acess to built up
data. So the bot needs to simulate the user input and then read the
new screen. This is different to finding a new URL which assumes the
data is stored as HTML (or other markup) somewhere statically and
needs to be accessed. The data is stored in a database only, and
everything is generated.- Hide quoted text -

- Show quoted text -

I'll think about this.
Look at http://www.grohe.co.uk/t/25_731.html for an example of a page
that does little before opening a js window.

Key code for example: <a href="#" onclick="advanced_matrix('open',
'6027693');">

My bot have to understand advanced_matrix and then produce the new
page.
 
Back
Top