Probably 2 different protocols.
HTTP for normal non-secure stuff
HTTPS for secure stuff
If you want to do an HTTP proxy then it really is not that difficult.
1. Accept a socket connection
2. Parse the HTTP request (ex. GET
http://www.amazon.com:80/index.html
HTTP/1.0)
3. Connect to the actual destination (
www.amazon.com port 80)
4. Send the request on to that server (GET index.html HTTP/1.0)
5. Read the reply back from that server
6. Send the reply back to the client
This is kind of a high level list of things that need to be done but it will
give you an idea of what it takes.
Now.. If you are using HTTPS (secure) then you are in a lot more work
because you have to deal with the whole SSL side of things. When an SSL
request is made from a browser to a proxy a different request is sent called
CONNECT. This tells the proxy server the IP and Port that the encrypted data
should be sent to.
Ex.)(not correct but it will give you an idea)
1. Browser connects to proxy
2. Browser sends CONNECT
www.amazon.com:80 to proxy
3. Proxy decodes IP and Port from CONNECT request
4. Proxy connects to IP and Port
5. Browser sends encrypted data to proxy
6. Proxy sends encrypted data to server
7. Proxy reads encrypted reply from server
8. Proxy sends encrypted reply back to client
Notice that the data is encrypted and the proxy does not know what the data
is. This is because you are not allowed to view the data since that is what
SSL is used for. If you want to know what the encrypted data is then you can
perfrom all of the SSL handshaking yourself with the browser BUT the browser
will inform the client that something is not right. What this means is that
the browser will know that the person doing the handshaking is not
www.amazon.com because the SSL Server Certificate does not match the site
that is using it.
So.. How can you make a proxy... Well.. This is not something that is simple
to do in terms of a 1-2 day project if you want it to work correctly all the
time. You need to take into account several things when designing your
proxy and if you do not then you will be in trouble.
1. Multiple requests at the same time. Most browsers make several requests
at one time to a server by opening multiple socket connections. This helps
speed up the downloading times. You will want to make sure that you can
handle multiple requests at one time.
2. Keep Alive connections. Can you handle multiple requests/replies one
right after the other on the same connection? One socket connection could do
100's of request/replies before it is closed by the browser.
3. Closed connections. What about only 1 request and reply per socket
connection? There is no way to force the client or server to keep a
connection open after a request/reply is finished.
4. HTTP parsing is a must. If you are going to examine the data being sent
then you will need to learn the HTTP protocol or how it works at least so
you can extract your data. Is it a cookie, post data, part of the URL,
something else?
5. HTML parsing may be required. If the data you are looking for is not in
the HTTP request headers then it must be in the HTML body. You will need to
figure out how to parse/extract that data out as well and it could be
encoded somehow as well.
I am sure there are things I have missed but this should be enough to get
you thinking and on the right track. There are probably several solutions
out there already that will perform the proxy functions but the extra step
of "sniffing" the data will still need to be added by you.
Hope this helps!