PC Review


Reply
Thread Tools Rate Thread

OK....forget the code...how about some pointers?

 
 
Jim
Guest
Posts: n/a
 
      22nd Jan 2006
I am trying to write an HTTP/HTTPS proxy server in VB.Net 2005.

But, I don't really even know how the internal workings of a proxy should
act. Does anyone have anything on the protocols used in handling requests by
proxies?

I have Googled my eyes out....and found nothing.

(BTW, props to Google for refusing the ridiculous request for their search
results.)

Jim


 
Reply With Quote
 
 
 
 
Jochen Kalmbach [MVP]
Guest
Posts: n/a
 
      22nd Jan 2006
Hi Jim!
> I am trying to write an HTTP/HTTPS proxy server in VB.Net 2005.
>
> But, I don't really even know how the internal workings of a proxy should
> act. Does anyone have anything on the protocols used in handling requests by
> proxies?


http://en.wikipedia.org/wiki/HTTP
http://www.ietf.org/rfc/rfc1945.txt
http://www.ietf.org/rfc/rfc2616.txt



--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/
 
Reply With Quote
 
 
 
 
Jim
Guest
Posts: n/a
 
      22nd Jan 2006

"Jochen Kalmbach [MVP]" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hi Jim!
>> I am trying to write an HTTP/HTTPS proxy server in VB.Net 2005.
>>
>> But, I don't really even know how the internal workings of a proxy should
>> act. Does anyone have anything on the protocols used in handling requests
>> by proxies?

>
> http://en.wikipedia.org/wiki/HTTP
> http://www.ietf.org/rfc/rfc1945.txt
> http://www.ietf.org/rfc/rfc2616.txt


I guess what I need to find is a tool that will let me see both sides of the
conversation that a browser has with a server.

For example.....All webpages are rendered in HTML, at their most basic
level. So, does the browser get the HTML as a single file, then the CSS
file, then files referred to in the HTML file (like graphics and such)?

I will read the RFCs completely to see if my questions are answered there.

Thanks for the links!

Jim


 
Reply With Quote
 
Jochen Kalmbach [MVP]
Guest
Posts: n/a
 
      22nd Jan 2006
Hi Jim!
>>http://en.wikipedia.org/wiki/HTTP
>>http://www.ietf.org/rfc/rfc1945.txt
>>http://www.ietf.org/rfc/rfc2616.txt

>
>
> I guess what I need to find is a tool that will let me see both sides of the
> conversation that a browser has with a server.


No. This might give you a hint... but what you need to understand is the
HTTP-Protocol!

> For example.....All webpages are rendered in HTML, at their most basic
> level. So, does the browser get the HTML as a single file, then the CSS
> file, then files referred to in the HTML file (like graphics and such)?


There are no "files" in the HTTP-protocol... only streams... and every
"link" has its own request for this data...

> I will read the RFCs completely to see if my questions are answered there.


It *will* answer your questions. Thats what RFSs are for.

--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/
 
Reply With Quote
 
Mike Labosh
Guest
Posts: n/a
 
      22nd Jan 2006
IMHO, it seems like there is not enough info from the OP:

At first, it seemed as if he were trying to write a proxy server (Why not
simply get ISA?)

If this is the question, then your code will need to bind to the network
adapter(s) on the localhost so that they can filter / proxy the packets as
they come through. For best results, look at the Microsoft ISA Server SDK.

If you are trying to write your own proxy server from scratch, I personally
think that there are not enough drugs, but here's where you would start:
Make a Windows Service that listens on TCP Sockets and allows the user to
specify at runtime what ports should be opened or closed.

If you are trying to intercept the HTML that a browser requests, have a look
at HTTPEXPL.EXE, one of the samples that ships on Visual Studio 97
Enterprise Edition, Disk 1. (I will now pause while you all go scrambling
through your collections) Source code is included. It displays web pages
in one pane, and the HTML Source in another, so you can see how it comes
down. [For everyone else in here] I am uncertain if this unsupported free
sample is distributable, so I choose not to post the code. But I think it
should be distributable because it's an interesting piece of work. Perhaps
an MVP can give us the OK?

If you are trying to determine the order of download when a browser makes
the request, (images, css, etc) simply write a test that connects to port
80, makes a request, and then carefully examine the stream that comes down.

--
Peace & happy computing,

Mike Labosh, MCSD MCT
"Escriba coda ergo sum." -- vbSensei
"Jochen Kalmbach [MVP]" <(E-Mail Removed)> wrote in message
news:#(E-Mail Removed)...
> Hi Jim!
> >>http://en.wikipedia.org/wiki/HTTP
> >>http://www.ietf.org/rfc/rfc1945.txt
> >>http://www.ietf.org/rfc/rfc2616.txt

> >
> >
> > I guess what I need to find is a tool that will let me see both sides of

the
> > conversation that a browser has with a server.

>
> No. This might give you a hint... but what you need to understand is the
> HTTP-Protocol!
>
> > For example.....All webpages are rendered in HTML, at their most basic
> > level. So, does the browser get the HTML as a single file, then the CSS
> > file, then files referred to in the HTML file (like graphics and such)?

>
> There are no "files" in the HTTP-protocol... only streams... and every
> "link" has its own request for this data...
>
> > I will read the RFCs completely to see if my questions are answered

there.
>
> It *will* answer your questions. Thats what RFSs are for.
>
> --
> Greetings
> Jochen
>
> My blog about Win32 and .NET
> http://blog.kalmbachnet.de/



 
Reply With Quote
 
BG
Guest
Posts: n/a
 
      22nd Jan 2006

"Jim" <(E-Mail Removed)> wrote in message
news:yIIAf.2496$(E-Mail Removed)...
>
> "Jochen Kalmbach [MVP]" <(E-Mail Removed)> wrote in
> message news:(E-Mail Removed)...
>> Hi Jim!
>>> I am trying to write an HTTP/HTTPS proxy server in VB.Net 2005.
>>>
>>> But, I don't really even know how the internal workings of a proxy
>>> should act. Does anyone have anything on the protocols used in handling
>>> requests by proxies?

>>
>> http://en.wikipedia.org/wiki/HTTP
>> http://www.ietf.org/rfc/rfc1945.txt
>> http://www.ietf.org/rfc/rfc2616.txt

>
> I guess what I need to find is a tool that will let me see both sides of
> the conversation that a browser has with a server.

Try ethereal
http://www.ethereal.com/


 
Reply With Quote
 
john smith
Guest
Posts: n/a
 
      22nd Jan 2006
Mike Labosh wrote:
> If you are trying to write your own proxy server from scratch, I personally
> think that there are not enough drugs, but here's where you would start:
> Make a Windows Service that listens on TCP Sockets and allows the user to
> specify at runtime what ports should be opened or closed.


That about sums it up. Something like that can't be a single synchronous
process either. More like multithreaded async calls (many simultaneous
requests from several clients). And that's assuming good knowledge of
networking and the protocols involved (things like TCP, HTTP and SSL). I
don't think I'd want to do this (plus the extra real-time filtering
overhead - hopefully just URLs filtering and not content) in VB.Net either.
 
Reply With Quote
 
Carl Daniel [VC++ MVP]
Guest
Posts: n/a
 
      22nd Jan 2006
Mike Labosh wrote:
>I am
> uncertain if this unsupported free sample is distributable, so I
> choose not to post the code. But I think it should be distributable
> because it's an interesting piece of work. Perhaps an MVP can give
> us the OK?


MVPs have no authority to give any such OK.

That said, in my experience unsupported free samples have always been
distributable, but the EULA from VS97 would be the definitive source of an
answer to that question.

-cd


 
Reply With Quote
 
Jim
Guest
Posts: n/a
 
      23rd Jan 2006
Maybe I haven't been clear enough.....I will try again.

I want to have a VB.Net 2005 coded proxy that is multi-threaded to accept
more than one client at a time.

I want to be able to scan html pages for objectionable content (be it adult
subject matter or advertisements) and remove them from the HTML shown in the
browser before the HTML is given to the browser.

I am only interested in scanning HTTP/HTTPS. I do not want to intercept any
other data streams.

This proxy may run as a server on the web for people to point to as they so
desire to remove unwanted elements from their surfing, or it may be freely
distributed to run locally on a person's PC or network.

Is that a better explanation?

Jim


 
Reply With Quote
 
john smith
Guest
Posts: n/a
 
      23rd Jan 2006
Jim wrote:
> Maybe I haven't been clear enough.....I will try again.
>
> I want to have a VB.Net 2005 coded proxy that is multi-threaded to accept
> more than one client at a time.
>
> I want to be able to scan html pages for objectionable content (be it adult
> subject matter or advertisements) and remove them from the HTML shown in the
> browser before the HTML is given to the browser.
>
> I am only interested in scanning HTTP/HTTPS. I do not want to intercept any
> other data streams.
>
> This proxy may run as a server on the web for people to point to as they so
> desire to remove unwanted elements from their surfing, or it may be freely
> distributed to run locally on a person's PC or network.
>
> Is that a better explanation?
>
> Jim
>
>


It's not that you haven't been clear (except a few minor details).

If there's any explanation needed, it's why you even want to do this.
For yourself (instead of using an existing proxy)? As a commercial
product (good luck with that)? Is it supposed to have some special
feature? Because right now it seems like you're almost reinventing the
wheel, and without any real reasons (and perhaps not using the best
language/tools for the job either; most such projects are usually in
MFC/C++ or such)...

Honestly, I don't think you're seeing the full extent of such a project.
Running a TcpListener as a service doesn't seem hard at first, but
then you gotta consider other things:

-Filtering; you want to do 2 types of filtering: by address and by
content. For the address you will need some block lists (by IP and
domain name). Parsing this will can be costly (in terms of time spent),
especially as the list grows (they're usually very long). You will have
to try and benchmark various ways to do this, profile and optimize code
a lot. For content filtering, this is only going to be harder (and
perhaps slower). There may be some challenges you didn't expect either
(tricks like people escaping some characters like s%65xsite.com and what
not)

-You will have to do everything you can to keep latency to a minimum
(from the added network latency you can't get rid of, plus all your
normal processing and filtering). And on the other hand, you will want
to keep CPU load to a minimum on the computer serving as a proxy as the
traffic/requests increases (not an easy thing to do here, considering
language/platform choice too). Keeping memory usage down may be a
significant problem too (lots of requests at the same time, lots of data
to check against, etc)

-Update/Maintain block lists. That alone is more than a full time job. I
don't know how you expect to make and update these... Many companies
employ several people just to maintain these for their own products.

-Distributing updates. Yes, you'll have to have some update mechanism
for your updates. At least for the block lists, but preferably also for
the program (without requiring reboots or downtime). Don't expect to
never have to update your program, it WILL happen, so you have to plan
accordingly.

-Maintenance should be very minimal. That includes updating. IT
departments nowadays don't want extra overhead/things to babysit/look
after. Which also brings me to the next point:

-Stability. This thing should be rock stable - never EVER crash. Same
story for memory leaks. This is *critical* but it won't be exactly easy.
When it crashes (or starts consuming a lot of resources), people have to
go fix the problem (see previous point), users (clients) get frustrated
with your product (and understandably start looking at the competitor's
offerings), etc. This will require you to write excellent code with
great error handling, write unit tests (and code coverage) to ensure
your code works 100% as intended, but also a LOT of load testing.
Ideally you'd want code reviews too. Trying to find memory leaks or
finding why your app consumes so much memory isn't always easy, quick
nor fun... And your reputation is almost on the line (you don't want
people to say "Jim [or whoever] writes buggy junk! it always crashes!"
do you?) If you plan to sell it, you will also need to support it...

-Features: people will want more features (especially if you wanted
someone to pay for it). After all, tons of very good free proxies have
loads of extra features... From supporting other protocols (like FTP,
WebDAV, etc), reporting features, caching (DNS and pages to accelerate
things and reduce BW usage), tons of options/configuration (perhaps
things like allowing/blocking specific HTTP verbs), etc.

-Coding time: to do all this properly (especially with all the testing,
debugging, optimization and all) will take you maybe not forever, but
still a very long time (especially if you're alone, and even more if you
start counting all the other required overhead - things like designing,
extensive documentation, planning, meetings, refactoring, etc). All this
coding time multiplied by the average programmer's hourly wage will be a
*LOT* of money in any case. Most likely more than it cost to buy any big
commercial offering (like ISA Server). Definitely more than it would
cost to buy a cheap computer (basic Dell plus some more RAM) and throw
something like linux+squid on it (a good, free, time-tested, stable,
full-featured, well documented, set-and-forget solution with support
basically), or some appliance made from it, or another similar solution.
Even as a internal IT dept project it would still be too costly...

-Also, you mention scanning HTTPS, which is SSL encrypted BTW. Normal
proxies don't filter/understand HTTPS traffic, they just pass it. If you
want this too, that means you will also need to make some SSL gateway
that will handle the SSL handshakes and everything else. More fun!

-We're assuming prior (excellent) knowledge of the
language/platform/framework, TCP/HTTP protocols, various RFCs, etc.

And this is just the very tip of the iceberg. Hopefully it helps you
realize what such a project encompasses. You make it sound like it
should be really trivial, but it isn't.
 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Search folders, some good, some empy, some full of junk jss Microsoft Outlook Discussion 2 21st May 2009 06:15 PM
Some games crash, some don't? Mother Farquhar Windows XP Help 0 23rd Jan 2005 11:46 AM
language some english some gibberish =?Utf-8?B?S2F0?= Windows XP Help 1 31st May 2004 12:38 AM
windows xp restarting in some game (SIMS,CROC) or some programs deniz Windows XP Performance 0 27th Mar 2004 08:57 AM
Some ISPs don't resolve some of my domains some of the time Matthew Eno Microsoft Windows 2000 DNS 1 11th Feb 2004 03:46 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 08:12 PM.