Personal HTTP Proxy
Learning Objectives
This assignment provides you a chance to learn about one of the most popular application protocols on the Internet- the Hypertext Transfer Protocol (HTTP)v. 1.0 and Java sockets API for client/server application development.
Overview
HTTP proxy acts as a mediator between the client (for example, web browser in a user's computer) and server (for example, web server site). In the simplest case, instead of sending requests directly to the server the client sends all its requests to the proxy. The proxy then opens a connection to the server, and passes on the client's request. The proxy receives the reply from the server, and then sends that reply back to the client. Notice that the proxy is essentially acting like both a HTTP client (to the remote server) and a HTTP server (to the initial client).
In this assignment, you will implement a personal HTTP proxy based on HTTP 1.0 (defined in detail in RFC 1945).
Uses of a HTTP Proxy
HTTP proxies are are used for the following reasons.
- Web Cache: Web cache can
substantially reduce the response time for the client request by saving a copy of the pages that it fetches.
Web caches can substantially reduce traffic the institution's access link to
the internet as there is no need to connect to the web server each time.
- Content Filtering and Transformation: While in the simplest case the proxy merely fetches a resource without inspecting it, there is nothing that says that a proxy is limited to blindly fetching and serving files. The proxy can inspect the requested URL and selectively block access to certain domains, reformat web pages (for instances, by stripping out images to make a page easier to display on a handheld or other limited-resource client), or perform other transformations and filtering.
- Privacy: Normally, web servers log all incoming requests for resources. This information typically includes at least the IP address of the client, the browser or other client program that they are using (called the User-Agent), the date and time, and the requested file. If a client does not wish to have this personally identifiable information recorded, routing HTTP requests through a proxy is one solution. All requests coming from clients using the same proxy appear to come from the IP address and User-Agent of the proxy itself, rather than the individual clients. If a number of clients use the same proxy (say, an entire business or university), it becomes much harder to link a particular HTTP transaction to a single computer or individual.
Assignment Requirements & Grading
-
Upload the
web proxy file (Java file or WinRAR file) inside IVLE (Projects->Assignment
2 GROUPS and SUBMISSION - 2006/2007s2-->Group X -->Project Folder) before
the due date given below.
-
The web
proxy program should accept the port number as command line argument.
-
The web proxy
should be capable of accepting HTTP requests, making requests from remote servers, and returning
the objects to the clients. If the port number is missing in the HTTP
request, the web proxy should assume port 80 as default.
An invalid
request from the client should be answered with an appropriate error code (400
Bad Request - When the request format/syntax is invalid, 405 Method
Not Allowed - When methods other than GET/POST/HEADER found in the
request). (3
points)
-
The web
proxy should make a copy of the objects fetched (with the value of
Last-Modified header) in its own disk storage. (3 points)
-
If an
object is requested again by a client, the web proxy should send a
conditional GET to find whether there is any changes to the object since it
was last obtained. If there is no changes, the web cache should send the
object to the client from its hard disk with appropriate HTTP headers.
otherwise, it should fetch a new copy from the web server and should send it
to the client. (3 points)
- Well written (good abstraction, error checking, readability) and well commented code.
(1 point)
Administrative Matters
You are required to
form your own team of 2 members and register your names in IVLE (Project folder
-> Assignment
groups) during week 9
(12 Mar – 16 Mar). You can register in one of the available groups which range
from group 1 to group 70. Each team member should contribute equally and
understand every part of the application. Assessment is based on team work and
individual contribution. Assessment methods include peer evaluation, demo and
interview.
Recommended platform:
Java (You can use other platforms).
Due for submission:
Week 11: 30-Mar-2007 Fri 11.59 PM
Demo/Interview: Week
12: (02 Apr – 05 Apr). Arrange time/venue with your tutor.
Additional Info (from FAQ):
- No need to If there is 'if-modified-since' header in the client's
request (from Browser), no need to handle/process this header.
- Need to handle only GET method. (POST/HEADER/SOAP are optional).
- You can assume one object per request.
- No need to handle dynamic web pages.
- To test you may use one of the course websites (as most of them are
static).
- To test a "304 Not Modified", try to access this site with a conditional
GET.
http://www.comp.nus.edu.sg/~cs2105/rss.xml.
- When you are using Conditional GET(If-Modified-Since:),
you should use the date returned in "Last-Modified" header. Otherwise,
conditional get will always produce "200 OK". This is because most of the
web servers compare the date for EQUALITY. (instead of GREATER THAN, LESS
THAN).
- It is optional for the web servers to return pages
with "Last-Modified" header.
- GET Request-URI
HTTP/Version
The
Request-URI is a Uniform Resource Identifier and identifies
the resource upon which to apply the request.
Request-URI = absoluteURI | abs_path
The two options for
Request-URI are dependent on the
nature of the request.
The absoluteURI form is only allowed when the
request is being made to a proxy.
-
For testing you may
use your browser
or,
Assuming you are
running your proxy in localhost which listens to port number 8088.
telnet localhost 8088
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
GET
http://comp.nus.edu.sg/~cs2105/rss.xml HTTP/1.0
If your proxy is working correctly,
the headers and the contents of rss.xml should be displayed on your terminal
screen.
-
- For simplicity, in this assignment
we will be dealing only with version 1.0 of the HTTP protocol, defined in
detail in
RFC 1945. The clinets connecting to your proxy server will use HTTP/1.0
in their requests.
- If you write a single-threaded proxy
server, you will probably see some problems when you use your proxy with a
standard web browser. Because a web browser like Firefox or IE issues
multiple HTTP requests for each URL you request (for instance, to download
images and other embedded content), a single-threaded proxy will likely miss
some requests, resulting in missing images or other minor errors. That's OK.
You are not required to use threading in this assignment. As long as your
proxy works correctly for a simple HTML document (like, for instance, this
assignment page) and follows the RFC, you can still receive all the points
for this assignment.