OK....forget the code...how about some pointers?

Jim · Jan 22, 2006

I am trying to write an HTTP/HTTPS proxy server in VB.Net 2005.

But, I don't really even know how the internal workings of a proxy should
act. Does anyone have anything on the protocols used in handling requests by
proxies?

I have Googled my eyes out....and found nothing.

(BTW, props to Google for refusing the ridiculous request for their search
results.)

Jim

Jochen Kalmbach [MVP] · Jan 22, 2006

Hi Jim!

I am trying to write an HTTP/HTTPS proxy server in VB.Net 2005.

But, I don't really even know how the internal workings of a proxy should
act. Does anyone have anything on the protocols used in handling requests by
proxies?

http://en.wikipedia.org/wiki/HTTP
http://www.ietf.org/rfc/rfc1945.txt
http://www.ietf.org/rfc/rfc2616.txt

--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/

Jim · Jan 22, 2006

Jochen Kalmbach said:
Hi Jim!

http://en.wikipedia.org/wiki/HTTP
http://www.ietf.org/rfc/rfc1945.txt
http://www.ietf.org/rfc/rfc2616.txt

I guess what I need to find is a tool that will let me see both sides of the
conversation that a browser has with a server.

For example.....All webpages are rendered in HTML, at their most basic
level. So, does the browser get the HTML as a single file, then the CSS
file, then files referred to in the HTML file (like graphics and such)?

I will read the RFCs completely to see if my questions are answered there.

Thanks for the links!

Jim

Jochen Kalmbach [MVP] · Jan 22, 2006

Hi Jim!

I guess what I need to find is a tool that will let me see both sides of the
conversation that a browser has with a server.

No. This might give you a hint... but what you need to understand is the
HTTP-Protocol!

For example.....All webpages are rendered in HTML, at their most basic
level. So, does the browser get the HTML as a single file, then the CSS
file, then files referred to in the HTML file (like graphics and such)?

There are no "files" in the HTTP-protocol... only streams... and every
"link" has its own request for this data...

I will read the RFCs completely to see if my questions are answered there.

It *will* answer your questions. Thats what RFSs are for.

--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/

Mike Labosh · Jan 22, 2006

IMHO, it seems like there is not enough info from the OP:

At first, it seemed as if he were trying to write a proxy server (Why not
simply get ISA?)

If this is the question, then your code will need to bind to the network
adapter(s) on the localhost so that they can filter / proxy the packets as
they come through. For best results, look at the Microsoft ISA Server SDK.

If you are trying to write your own proxy server from scratch, I personally
think that there are not enough drugs, but here's where you would start:
Make a Windows Service that listens on TCP Sockets and allows the user to
specify at runtime what ports should be opened or closed.

If you are trying to intercept the HTML that a browser requests, have a look
at HTTPEXPL.EXE, one of the samples that ships on Visual Studio 97
Enterprise Edition, Disk 1. (I will now pause while you all go scrambling
through your collections) Source code is included. It displays web pages
in one pane, and the HTML Source in another, so you can see how it comes
down. [For everyone else in here] I am uncertain if this unsupported free
sample is distributable, so I choose not to post the code. But I think it
should be distributable because it's an interesting piece of work. Perhaps
an MVP can give us the OK?

If you are trying to determine the order of download when a browser makes
the request, (images, css, etc) simply write a test that connects to port
80, makes a request, and then carefully examine the stream that comes down.

BG · Jan 22, 2006

Jim said:
I guess what I need to find is a tool that will let me see both sides of
the conversation that a browser has with a server.

Try ethereal
http://www.ethereal.com/

john smith · Jan 22, 2006

Mike said:
If you are trying to write your own proxy server from scratch, I personally
think that there are not enough drugs, but here's where you would start:
Make a Windows Service that listens on TCP Sockets and allows the user to
specify at runtime what ports should be opened or closed.

That about sums it up. Something like that can't be a single synchronous
process either. More like multithreaded async calls (many simultaneous
requests from several clients). And that's assuming good knowledge of
networking and the protocols involved (things like TCP, HTTP and SSL). I
don't think I'd want to do this (plus the extra real-time filtering
overhead - hopefully just URLs filtering and not content) in VB.Net either.

Carl Daniel [VC++ MVP] · Jan 22, 2006

Mike said:
I am
uncertain if this unsupported free sample is distributable, so I
choose not to post the code. But I think it should be distributable
because it's an interesting piece of work. Perhaps an MVP can give
us the OK?

MVPs have no authority to give any such OK.

That said, in my experience unsupported free samples have always been
distributable, but the EULA from VS97 would be the definitive source of an
answer to that question.

-cd

Jim · Jan 23, 2006

Maybe I haven't been clear enough.....I will try again.

I want to have a VB.Net 2005 coded proxy that is multi-threaded to accept
more than one client at a time.

I want to be able to scan html pages for objectionable content (be it adult
subject matter or advertisements) and remove them from the HTML shown in the
browser before the HTML is given to the browser.

I am only interested in scanning HTTP/HTTPS. I do not want to intercept any
other data streams.

This proxy may run as a server on the web for people to point to as they so
desire to remove unwanted elements from their surfing, or it may be freely
distributed to run locally on a person's PC or network.

Is that a better explanation?

Jim

john smith · Jan 23, 2006

Jim said:
Maybe I haven't been clear enough.....I will try again.

I want to have a VB.Net 2005 coded proxy that is multi-threaded to accept
more than one client at a time.

I want to be able to scan html pages for objectionable content (be it adult
subject matter or advertisements) and remove them from the HTML shown in the
browser before the HTML is given to the browser.

I am only interested in scanning HTTP/HTTPS. I do not want to intercept any
other data streams.

This proxy may run as a server on the web for people to point to as they so
desire to remove unwanted elements from their surfing, or it may be freely
distributed to run locally on a person's PC or network.

Is that a better explanation?

Jim

It's not that you haven't been clear (except a few minor details).

If there's any explanation needed, it's why you even want to do this.
For yourself (instead of using an existing proxy)? As a commercial
product (good luck with that)? Is it supposed to have some special
feature? Because right now it seems like you're almost reinventing the
wheel, and without any real reasons (and perhaps not using the best
language/tools for the job either; most such projects are usually in
MFC/C++ or such)...

Honestly, I don't think you're seeing the full extent of such a project.
Running a TcpListener as a service doesn't seem hard at first, but
then you gotta consider other things:

-Filtering; you want to do 2 types of filtering: by address and by
content. For the address you will need some block lists (by IP and
domain name). Parsing this will can be costly (in terms of time spent),
especially as the list grows (they're usually very long). You will have
to try and benchmark various ways to do this, profile and optimize code
a lot. For content filtering, this is only going to be harder (and
perhaps slower). There may be some challenges you didn't expect either
(tricks like people escaping some characters like s%65xsite.com and what
not)

-You will have to do everything you can to keep latency to a minimum
(from the added network latency you can't get rid of, plus all your
normal processing and filtering). And on the other hand, you will want
to keep CPU load to a minimum on the computer serving as a proxy as the
traffic/requests increases (not an easy thing to do here, considering
language/platform choice too). Keeping memory usage down may be a
significant problem too (lots of requests at the same time, lots of data
to check against, etc)

-Update/Maintain block lists. That alone is more than a full time job. I
don't know how you expect to make and update these... Many companies
employ several people just to maintain these for their own products.

-Distributing updates. Yes, you'll have to have some update mechanism
for your updates. At least for the block lists, but preferably also for
the program (without requiring reboots or downtime). Don't expect to
never have to update your program, it WILL happen, so you have to plan
accordingly.

-Maintenance should be very minimal. That includes updating. IT
departments nowadays don't want extra overhead/things to babysit/look
after. Which also brings me to the next point:

-Stability. This thing should be rock stable - never EVER crash. Same
story for memory leaks. This is *critical* but it won't be exactly easy.
When it crashes (or starts consuming a lot of resources), people have to
go fix the problem (see previous point), users (clients) get frustrated
with your product (and understandably start looking at the competitor's
offerings), etc. This will require you to write excellent code with
great error handling, write unit tests (and code coverage) to ensure
your code works 100% as intended, but also a LOT of load testing.
Ideally you'd want code reviews too. Trying to find memory leaks or
finding why your app consumes so much memory isn't always easy, quick
nor fun... And your reputation is almost on the line (you don't want
people to say "Jim [or whoever] writes buggy junk! it always crashes!"
do you?) If you plan to sell it, you will also need to support it...

-Features: people will want more features (especially if you wanted
someone to pay for it). After all, tons of very good free proxies have
loads of extra features... From supporting other protocols (like FTP,
WebDAV, etc), reporting features, caching (DNS and pages to accelerate
things and reduce BW usage), tons of options/configuration (perhaps
things like allowing/blocking specific HTTP verbs), etc.

-Coding time: to do all this properly (especially with all the testing,
debugging, optimization and all) will take you maybe not forever, but
still a very long time (especially if you're alone, and even more if you
start counting all the other required overhead - things like designing,
extensive documentation, planning, meetings, refactoring, etc). All this
coding time multiplied by the average programmer's hourly wage will be a
*LOT* of money in any case. Most likely more than it cost to buy any big
commercial offering (like ISA Server). Definitely more than it would
cost to buy a cheap computer (basic Dell plus some more RAM) and throw
something like linux+squid on it (a good, free, time-tested, stable,
full-featured, well documented, set-and-forget solution with support
basically), or some appliance made from it, or another similar solution.
Even as a internal IT dept project it would still be too costly...

-Also, you mention scanning HTTPS, which is SSL encrypted BTW. Normal
proxies don't filter/understand HTTPS traffic, they just pass it. If you
want this too, that means you will also need to make some SSL gateway
that will handle the SSL handshakes and everything else. More fun!

-We're assuming prior (excellent) knowledge of the
language/platform/framework, TCP/HTTP protocols, various RFCs, etc.

And this is just the very tip of the iceberg. Hopefully it helps you
realize what such a project encompasses. You make it sound like it
should be really trivial, but it isn't.

Mike Labosh · Jan 25, 2006

uncertain if this unsupported free sample is distributable, so I

MVPs have no authority to give any such OK.

Poor phrasing on my part. I know this.

That said, in my experience unsupported free samples have always been
distributable, but the EULA from VS97 would be the definitive source of an
answer to that question.

This is what I meant, that an MVP would know better than me whether it is,
not make the decision. I will go look at the EULA.

Peter Oliphant · Jan 25, 2006

Here's an interesting question. What binds a person to obey the EULA? I
imagine you can't be held to its terms unless you buy the product itself.

So, can a person who has not bought a product free to violate the product's
EULA (assuming such actions are not against the law in and of themselves)?

I ask this because your response to "can the sample code be posted?" implies
the EULA is the 'authority' on this subject. But if someone has not bought
the product themselves (say, a relative or friend) then they could post such
sample code if the EULA is all that would be violated. So why not allow
someone who HAS bought the product from doing it too?

This is kind of ass-backwards in a certain sense. It seems someone who buys
the product will have less rights (in some situations) than someone who
didn't. I make this point since I think the EULA should never limit someone
more than someone who hasn't bought the product. That is, it should let you
know the limit of your *NEW* CAPABILITIES made possiblr by the product
you've bought, and not increase your EXISTING restrictions. It should make
clear what NOBODY can do, not what the person who bought the product can't
do. And if someone who hasn't bought the product CAN do something, then the
EULA should not restrict someone who has bought the product from doing it.
It's a sublte point, but it makes sense...

Note that a person who has not bought the product has no access to many
elements of the product, so their inability to do some actions is 'physical'
(that is, they *can't* do it in contrast to the are *not allowed* to do it).
For example, a person who doesn't have a legal copy still can't make other
copies and distribute. So, I'm not implying a person who hasn't bought the
product can violate, say, copyright rules (since these a based on common
law, not 'contract law' imposed by a Eula)...

[==P==]

PS - I would have posted this in the 'general' section; but this is a
response to a post here, so I feel it's an approriate place to post this...

Herfried K. Wagner [MVP] · Jan 25, 2006

Peter Oliphant said:
Here's an interesting question. What binds a person to obey the EULA? I
imagine you can't be held to its terms unless you buy the product itself.

Does not having a drivers license grant you the right to drive as fast as
you want?

There are several other laws which prevent the manufacturer's rights
(copyright in the U.S., "Urheberrecht" in Germany and Austria, for example).

Jay B. Harlow [MVP - Outlook] · Jan 26, 2006

| There are several other laws which prevent the manufacturer's rights
| (copyright in the U.S., "Urheberrecht" in Germany and Austria, for
example).
"prevent" or did you perhaps mean "protect"?

--
Hope this helps
Jay [MVP - Outlook]
..NET Application Architect, Enthusiast, & Evangelist
T.S. Bradley - http://www.tsbradley.net

| > Here's an interesting question. What binds a person to obey the EULA? I
| > imagine you can't be held to its terms unless you buy the product
itself.
|
| Does not having a drivers license grant you the right to drive as fast as
| you want?
|
| There are several other laws which prevent the manufacturer's rights
| (copyright in the U.S., "Urheberrecht" in Germany and Austria, for
example).
|
| --
| M S Herfried K. Wagner
| M V P <URL:http://dotnet.mvps.org/>
| V B <URL:http://classicvb.org/petition/>
|

Herfried K. Wagner [MVP] · Jan 26, 2006

Jay,

Jay B. Harlow said:
| There are several other laws which prevent the manufacturer's rights
| (copyright in the U.S., "Urheberrecht" in Germany and Austria, for
example).
"prevent" or did you perhaps mean "protect"?

Ooops. I actually wanted to type "protect". Thank you for pointing this
important detail out!

Jim · Jan 26, 2006

john smith said:
It's not that you haven't been clear (except a few minor details).

If there's any explanation needed, it's why you even want to do this. For
yourself (instead of using an existing proxy)? As a commercial product
(good luck with that)? Is it supposed to have some special feature?
Because right now it seems like you're almost reinventing the wheel, and
without any real reasons (and perhaps not using the best language/tools
for the job either; most such projects are usually in MFC/C++ or such)...

I am sorry for the vagueness concerning the end freeware product, but that
is necessary at this time.

Honestly, I don't think you're seeing the full extent of such a project.
Running a TcpListener as a service doesn't seem hard at first, but then
you gotta consider other things:

-Filtering; you want to do 2 types of filtering: by address and by
content. For the address you will need some block lists (by IP and domain
name). Parsing this will can be costly (in terms of time spent),
especially as the list grows (they're usually very long). You will have to
try and benchmark various ways to do this, profile and optimize code a
lot. For content filtering, this is only going to be harder (and perhaps
slower). There may be some challenges you didn't expect either (tricks
like people escaping some characters like s%65xsite.com and what not)

I don't expect the code that tracks the ads and adult content to be easy to
code. But, that's the part I look forward to most of all. I like a
challenge.

"Ahhh!" you might say..."Then why don't you accept the proxy portion as a
challenge?" Basically because the underlying portions will be so
challenging - as you have correctly pointed out. I'd rather devote my time
to those portions of the task, if possible.

-You will have to do everything you can to keep latency to a minimum (from
the added network latency you can't get rid of, plus all your normal
processing and filtering). And on the other hand, you will want to keep
CPU load to a minimum on the computer serving as a proxy as the
traffic/requests increases (not an easy thing to do here, considering
language/platform choice too). Keeping memory usage down may be a
significant problem too (lots of requests at the same time, lots of data
to check against, etc)

I think I have already worked out an algorithim that will do nicely in these
areas.

-Update/Maintain block lists. That alone is more than a full time job. I
don't know how you expect to make and update these... Many companies
employ several people just to maintain these for their own products.

Also already pseudo-coded.

-Distributing updates. Yes, you'll have to have some update mechanism for
your updates. At least for the block lists, but preferably also for the
program (without requiring reboots or downtime). Don't expect to never
have to update your program, it WILL happen, so you have to plan
accordingly.

Got that planned too.

-Maintenance should be very minimal. That includes updating. IT
departments nowadays don't want extra overhead/things to babysit/look
after.

I agree. It will be completely hands off after the installation.
Self-repairing and updating.

Which also brings me to the next point:

-Stability. This thing should be rock stable - never EVER crash.

While nobody can guarantee that.....that will be the goal.

Same story for memory leaks. This is *critical* but it won't be exactly
easy. When it crashes (or starts consuming a lot of resources), people have
to go fix the problem (see previous point), users (clients) get frustrated
with your product (and understandably start looking at the competitor's
offerings), etc. This will require you to write excellent code with great
error handling, write unit tests (and code coverage) to ensure your code
works 100% as intended, but also a LOT of load testing. Ideally you'd want
code reviews too. Trying to find memory leaks or finding why your app
consumes so much memory isn't always easy, quick nor fun... And your
reputation is almost on the line (you don't want people to say "Jim [or
whoever] writes buggy junk! it always crashes!" do you?) If you plan to
sell it, you will also need to support it...

I agree completely. That's why I want to concentrate on the core services
and not the surrounding proxy services.

I have enough to do without re-inventing the proxy wheel.

It will be freeware and it will be supported.

-Features: people will want more features (especially if you wanted
someone to pay for it). After all, tons of very good free proxies have
loads of extra features... From supporting other protocols (like FTP,
WebDAV, etc), reporting features, caching (DNS and pages to accelerate
things and reduce BW usage), tons of options/configuration (perhaps things
like allowing/blocking specific HTTP verbs), etc.

It will be free.....always free.

-Coding time: to do all this properly (especially with all the testing,
debugging, optimization and all) will take you maybe not forever, but
still a very long time (especially if you're alone, and even more if you
start counting all the other required overhead - things like designing,
extensive documentation, planning, meetings, refactoring, etc). All this
coding time multiplied by the average programmer's hourly wage will be a
*LOT* of money in any case. Most likely more than it cost to buy any big
commercial offering (like ISA Server). Definitely more than it would cost
to buy a cheap computer (basic Dell plus some more RAM) and throw
something like linux+squid on it (a good, free, time-tested, stable,
full-featured, well documented, set-and-forget solution with support
basically), or some appliance made from it, or another similar solution.
Even as a internal IT dept project it would still be too costly...

It will be a costly undertaking. But, I have taken that into consideration.

-Also, you mention scanning HTTPS, which is SSL encrypted BTW. Normal
proxies don't filter/understand HTTPS traffic, they just pass it. If you
want this too, that means you will also need to make some SSL gateway that
will handle the SSL handshakes and everything else. More fun!

I have SSL components that should make this much easier to incorporate.

-We're assuming prior (excellent) knowledge of the
language/platform/framework, TCP/HTTP protocols, various RFCs, etc.

And this is just the very tip of the iceberg. Hopefully it helps you
realize what such a project encompasses. You make it sound like it should
be really trivial, but it isn't.

You are right. This is a complex project, but the core proxy services are
not. That's why I hoped to get some assistance with the basic proxy
services. Then I can concentrate on the more complex portions of the code.

Thanks for your post!

Jim

Nick Hounsome · Jan 26, 2006

If you have not bought a piece of code and nobody who has bought it has the
right to publish it then how could you ever have seen it to publish it
yourself?

If you reply that you just got hold of it when someone else published it
illegally that is not good enough - It is the equivalent to saying that you
can keep someones wallet if a thief steals it and then leaves it lying
around for you to pick up.

Peter Oliphant said:
Here's an interesting question. What binds a person to obey the EULA? I
imagine you can't be held to its terms unless you buy the product itself.

So, can a person who has not bought a product free to violate the
product's EULA (assuming such actions are not against the law in and of
themselves)?

I ask this because your response to "can the sample code be posted?"
implies the EULA is the 'authority' on this subject. But if someone has
not bought the product themselves (say, a relative or friend) then they
could post such sample code if the EULA is all that would be violated. So
why not allow someone who HAS bought the product from doing it too?

This is kind of ass-backwards in a certain sense. It seems someone who
buys the product will have less rights (in some situations) than someone
who didn't. I make this point since I think the EULA should never limit
someone more than someone who hasn't bought the product. That is, it
should let you know the limit of your *NEW* CAPABILITIES made possiblr by
the product you've bought, and not increase your EXISTING restrictions. It
should make clear what NOBODY can do, not what the person who bought the
product can't do. And if someone who hasn't bought the product CAN do
something, then the EULA should not restrict someone who has bought the
product from doing it. It's a sublte point, but it makes sense...

Note that a person who has not bought the product has no access to many
elements of the product, so their inability to do some actions is
'physical' (that is, they *can't* do it in contrast to the are *not
allowed* to do it). For example, a person who doesn't have a legal copy
still can't make other copies and distribute. So, I'm not implying a
person who hasn't bought the product can violate, say, copyright rules
(since these a based on common law, not 'contract law' imposed by a
Eula)...

[==P==]

PS - I would have posted this in the 'general' section; but this is a
response to a post here, so I feel it's an approriate place to post
this...

Carl Daniel said:

MVPs have no authority to give any such OK.

That said, in my experience unsupported free samples have always been
distributable, but the EULA from VS97 would be the definitive source of
an answer to that question.

-cd

Click to expand...

Peter Oliphant · Jan 26, 2006

Does not having a drivers license grant you the right to drive as fast as
No. But that's why I said one would still be bound by PUBLIC law. Using your
analogy, when you buy a car does the manufacturer of the car require you to
sign a document establishing contractually legally binding rules you must
follow when driving the car you bought from them that did not pre-exist
BEFORE you bought the car?

No. Typically when buy something the manufacturer does not impose
restrictions you didn't have before. Heck, one typically buys a product to
extend one's capabilities, not have the restricted?

I want to be very clear here. In this thread, someone asked if they could
post some source code. Someone else said it was up to the EULA. I'm saying
it shouldn't be up to the EULA. If it's legal for someone to do who hasn't
bought the product it should also be legal for someone to do it who HAS
bought the product...

[==P==]

Cor Ligthert [MVP] · Jan 26, 2006

Peter,

http://en.wikipedia.org/wiki/EULA

Cor

Peter Oliphant · Jan 26, 2006

Hi Cor,

GREAT link about this subject! Thanx! : )

[==P==]

OK....forget the code...how about some pointers?

Jim

Jochen Kalmbach [MVP]

Jim

Jochen Kalmbach [MVP]

Mike Labosh

BG

john smith

Carl Daniel [VC++ MVP]

Jim

john smith

Mike Labosh

Peter Oliphant

Herfried K. Wagner [MVP]

Jay B. Harlow [MVP - Outlook]

Herfried K. Wagner [MVP]

Jim

Nick Hounsome

Peter Oliphant

Cor Ligthert [MVP]

Peter Oliphant