T
Toff McGowen
Just ask Google. They offer this kind of functionality as a service
already.... well they do for the Peoples Republic of China at least.
tm
already.... well they do for the Peoples Republic of China at least.
tm
Jim said:such)...john smith said:It's not that you haven't been clear (except a few minor details).
If there's any explanation needed, it's why you even want to do this. For
yourself (instead of using an existing proxy)? As a commercial product
(good luck with that)? Is it supposed to have some special feature?
Because right now it seems like you're almost reinventing the wheel, and
without any real reasons (and perhaps not using the best language/tools
for the job either; most such projects are usually in MFC/C++ or
I am sorry for the vagueness concerning the end freeware product, but that
is necessary at this time.
Honestly, I don't think you're seeing the full extent of such a project.
Running a TcpListener as a service doesn't seem hard at first, but then
you gotta consider other things:
-Filtering; you want to do 2 types of filtering: by address and by
content. For the address you will need some block lists (by IP and domain
name). Parsing this will can be costly (in terms of time spent),
especially as the list grows (they're usually very long). You will have to
try and benchmark various ways to do this, profile and optimize code a
lot. For content filtering, this is only going to be harder (and perhaps
slower). There may be some challenges you didn't expect either (tricks
like people escaping some characters like s%65xsite.com and what not)
I don't expect the code that tracks the ads and adult content to be easy to
code. But, that's the part I look forward to most of all. I like a
challenge.
"Ahhh!" you might say..."Then why don't you accept the proxy portion as a
challenge?" Basically because the underlying portions will be so
challenging - as you have correctly pointed out. I'd rather devote my time
to those portions of the task, if possible.
-You will have to do everything you can to keep latency to a minimum (from
the added network latency you can't get rid of, plus all your normal
processing and filtering). And on the other hand, you will want to keep
CPU load to a minimum on the computer serving as a proxy as the
traffic/requests increases (not an easy thing to do here, considering
language/platform choice too). Keeping memory usage down may be a
significant problem too (lots of requests at the same time, lots of data
to check against, etc)
I think I have already worked out an algorithim that will do nicely in theseareas.
-Update/Maintain block lists. That alone is more than a full time job. I
don't know how you expect to make and update these... Many companies
employ several people just to maintain these for their own products.
Also already pseudo-coded.
-Distributing updates. Yes, you'll have to have some update mechanism for
your updates. At least for the block lists, but preferably also for the
program (without requiring reboots or downtime). Don't expect to never
have to update your program, it WILL happen, so you have to plan
accordingly.
Got that planned too.
-Maintenance should be very minimal. That includes updating. IT
departments nowadays don't want extra overhead/things to babysit/look
after.
I agree. It will be completely hands off after the installation.
Self-repairing and updating.
Which also brings me to the next point:
-Stability. This thing should be rock stable - never EVER crash.
While nobody can guarantee that.....that will be the goal.
Same story for memory leaks. This is *critical* but it won't be exactly
easy. When it crashes (or starts consuming a lot of resources), people have
to go fix the problem (see previous point), users (clients) get frustrated
with your product (and understandably start looking at the competitor's
offerings), etc. This will require you to write excellent code with great
error handling, write unit tests (and code coverage) to ensure your code
works 100% as intended, but also a LOT of load testing. Ideally you'd want
code reviews too. Trying to find memory leaks or finding why your app
consumes so much memory isn't always easy, quick nor fun... And your
reputation is almost on the line (you don't want people to say "Jim [or
whoever] writes buggy junk! it always crashes!" do you?) If you plan to
sell it, you will also need to support it...
I agree completely. That's why I want to concentrate on the core services
and not the surrounding proxy services.
I have enough to do without re-inventing the proxy wheel.
It will be freeware and it will be supported.
-Features: people will want more features (especially if you wanted
someone to pay for it). After all, tons of very good free proxies have
loads of extra features... From supporting other protocols (like FTP,
WebDAV, etc), reporting features, caching (DNS and pages to accelerate
things and reduce BW usage), tons of options/configuration (perhaps things
like allowing/blocking specific HTTP verbs), etc.
It will be free.....always free.
-Coding time: to do all this properly (especially with all the testing,
debugging, optimization and all) will take you maybe not forever, but
still a very long time (especially if you're alone, and even more if you
start counting all the other required overhead - things like designing,
extensive documentation, planning, meetings, refactoring, etc). All this
coding time multiplied by the average programmer's hourly wage will be a
*LOT* of money in any case. Most likely more than it cost to buy any big
commercial offering (like ISA Server). Definitely more than it would cost
to buy a cheap computer (basic Dell plus some more RAM) and throw
something like linux+squid on it (a good, free, time-tested, stable,
full-featured, well documented, set-and-forget solution with support
basically), or some appliance made from it, or another similar solution.
Even as a internal IT dept project it would still be too costly...
It will be a costly undertaking. But, I have taken that into consideration.-Also, you mention scanning HTTPS, which is SSL encrypted BTW. Normal
proxies don't filter/understand HTTPS traffic, they just pass it. If you
want this too, that means you will also need to make some SSL gateway that
will handle the SSL handshakes and everything else. More fun!
I have SSL components that should make this much easier to incorporate.
-We're assuming prior (excellent) knowledge of the
language/platform/framework, TCP/HTTP protocols, various RFCs, etc.
And this is just the very tip of the iceberg. Hopefully it helps you
realize what such a project encompasses. You make it sound like it should
be really trivial, but it isn't.
You are right. This is a complex project, but the core proxy services are
not. That's why I hoped to get some assistance with the basic proxy
services. Then I can concentrate on the more complex portions of the code.
Thanks for your post!
Jim