Programming Assignments for Best-Peer Project

Programming Assignments for CS 6203

Background and Project Requirements

Note that the programming assignments are built around the BestPeer P2P project, which is in the re-implementation phase. The codes that you develop must therefore work with the existing system. You can work in a group of 2-3 persons, and projects will be allocated on the first come first serve basis. If you have any questions or problems regarding the implementation, you are welcome to consult Dr Xu Linhao (xulh@comp.nus.edu.sg).

If you program the project with JAVA, please pay attention to the followings:

1. All rules mentioned in [1] and [2] must be followed strictly, and all algorithms must be described in the report and comments.

2. All methods/variables must be commented.

3. If you need help on how to write codes properly, please unzip src.zip file in the Java SDK directory and read the source codes.

4. Please use Java SDK 1.5 to develop the project.

If you program the project with C/C++, please pay attention to the followings:

1. All rules (naming convention and coding style) mentioned in [4] must be followed strictly, and all algorithms must be described in the report and comments.

2. All methods/variables must be commented.

3. It is recommended to use the STL (Standard Template Library).

Report that explains the design of the algorithms, and flow of the program must be submitted together with the program. The assignment is due by 15^th November 2006. Total mark: 40%. Marks will be given for good program structure, efficient codes, robustness of the program, innovation and documentation/report.

Programming Assignments

Name	Instant Message Module
Goal	Providing the facility between remote users (i.e., client peers) to exchange messages and upload/download files
Precondition	Relatively independent of the current version of the Best-Peer project; the basic socket communication framework in the current version can be used but should take place “java.net.Socket” with “java.net.DatagramSocket”
Requirements	(1) The network bandwidth at each peer side should be used efficiently and the implementation should be lightweight; (2) Interface must be well defined with detailed comments; (3) Separating GUI from underlying operation, i.e., using MCV model.
Specification	(1) User & Group Management: register a user with his email account; remove a user from system; create a user group; add a user to a group; remove a user from a user group; (2) Message Communication: Send a message to a user; receive messages from other users (use UDP connection here); store all received messages of a remote user into a file locally and if the file size exceeds a predefined-threshold, create a new one for recording new messages (this can be implemented by “java.util.logging”); (3) Data Transfer: allow users to upload/download files to/from remote users (use TCP connection here); the received file is stored in the predetermined directory; allow users to specify what kind of file types he wants to accept, and filter all insecure file types during file transferring (4) Bootstrap: A bootstrap server should be built first to maintain all online peers (this can use existing code in the Best-Peer project), but different USER_ACCOUNT relation table should be maintained by a database (e.g., MySQL) for evaluating the validness of any user; (5) Online Status Monitor: the bootstrap server needs to monitor the online status of each peer periodically, and perform the corresponding operations: peer joining – when a peer joins the network, the bootstrap server needs to update all online peers, which have the friend relationship with it, for this event and then all those online peers will update the online status of the peer. At the same time, the bootstrap server needs also tell the peer which friend peers are online; peer leaving – when a peer leaves the network, the bootstrap server should notify all friend peers to update the online status of the peer; peer failure – if the bootstrap server sends PING request to a peer several times but without any response from it, the bootstrap server will regard the peer failure and then notify all friend peers about this event; bootstrap server failure – when the bootstrap server is down, it will notify all peers to depart from the system; or if any peer sends PING request to the bootstrap server several times but without response, it will regards the bootstrap server failure and then leave the network by sending LEAVE request to all online friend peers;
Package Name	sg.edu.nus.im
Memo	Google Talk and MSN are two good running examples

Name	Network Simulation Module
Goal	Providing the facility to create the new instances of bootstrap server, super peer and client peer, and allow them to enter and leave the system with some well-known probability distribution
Precondition	Relatively independent of the current version of the Best-Peer project; before programming, should understand the basic interface of how to new client peer, super peer, bootstrap server and the message type defined in class “sg.edu.nus.protocol.MsgType”;
Requirements	(1) The implementation must use event-driven model and should be lightweight; (2) Interface must be well defined with detailed comments;
Specification	(1) Defining events: use Factory pattern (mentioned in [4]) to create network events for the simulator; the pre-defined network event (needs to be implemented) includes events for creating the instances of different peers, events for peers joining, leaving, events for sharing and un-sharing local files, events for creating keywords queries; (2) Event generator: generate network events with different distribution, such as Gaussian distribution, exponential distribution, Poisson distribution, and put all events into a Priority Queue (study how to use this data structure to simulate events); (3) Simulator: get network events from the Priority Queue and execute the corresponding operation by invoking the current APIs (may be not suitable for invoking but can do a simple simulation if necessary); “java.lang.reflect” should be used for generating new instance of peers; (4) Visualization: provide a GUI panel to display the dynamic behavior of the peer instances; (5) Logging System status: for each step, all information (must include error and exception messages) must be recorded into a log file (using “java.util.logging”;
Package Name	sg.edu.nus.simulate

Name	Online Peer Monitor Module
Goal	Providing the facility monitoring the behavior of all peers, including peers joining, departure and failure
Precondition	Dependent on the current version of the Best-Peer project; the basic socket communication framework in the current version can be used but should take place “java.net.Socket” with “java.net.DatagramSocket”; The GUI is ready and can be used directly;
Requirements	(1) The network bandwidth at each peer side should be used efficiently and the implementation should be lightweight; (2) The interface used to invoke monitor operation should be well defined;
Specification	(1) Peer departure: detect the “super peer departure” event, and then update the necessary information on both Bootstrap Server GUI and Super Peer GUI; detect the “client peer departure” event, and then update the necessary information on the Super Peer GUI; detect the “bootstrap server departure” event, and then broadcast this event to all super peers for their departure behaviors. At the same time, for each super peer, it must force its client peers to leave the network; (2) Peer failure: detect the “super peer failure” event by polling PING message to each super peer periodically, and then update the necessary information on both the GUIs of both bootstrap server and super peer and invoke “stabilization” operation (interface is defined and can be invoked directly) at the super peer side; detect the “client peer failure” event by polling PING message to each client peer periodically, and then update the necessary information on the super peer’s GUI; detect the “bootstrap server failure” event by sending PING message to the bootstrap server periodically, and then notify all super peers and their client peers to leave the network; (3) Peer join: detect the “client peer join” event, and then update the necessary information on the Super Peer GUI; detect the “super peer join” event, and then update the necessary information on both Bootstrap Server GUI and Super Peer GUI;
Package Name	sg.edu.nus.monitor
Memo	Similar to how to monitor peer’s online status in MSN and Google Talk

Name	Bootstrap Synchronization Module
Goal	Providing the facility for bootstrap servers to synchronize the online super peer list with one another, and therefore each client peer or super peer can join the network by using one of bootstrap server; this makes the system more robust and reduce burden at bootstrap side;
Precondition	Relatively independent of the current version of the Best-Peer project; the basic socket communication framework can be used
Requirements	The network bandwidth at each bootstrap server side should be used efficiently and the implementation should be lightweight;
Specification	(1) Information maintained by bootstrap server: each bootstrap server must maintain two list: one for all online super peers and the other for some online bootstrap servers (see bootstrap organization); (2) Bootstrap organization: All bootstrap server in the system should be organized as a small tapestry system, and a bootstrap server is assumed to be there forever as the root bootstrap; if the root bootstrap server is down, then a new one should be selected out to act as a new root bootstrap server; (3) Add new bootstrap server: a new bootstrap server can join the system by accessing an existing one, and then copy the super peer list from it; (4) Detecting bootstrap server status: each bootstrap server needs to monitor all other bootstrap server’s online status and updates the online bootstrap server list; (5) Communication between bootstrap servers: the communication among all bootstrap servers are employed tapestry-liked multicast (prefix-based multicast);
Package Name	sg.edu.nus.btsync

Name	IR-based Schema Matching Module
Goal	Providing the facility to discover the matched relation database with the shared schema; returning the matched schema to users for further query processing
Precondition	(1) A client user can share his local small database by exporting the schema to other users; a schema management module at each client peer side is responsible for maintaining all shared schema; and a schema management module at each super peer side is responsible for maintaining all shared schemas by all client peers belonging to it; (2) Each peer has installed a relational database; MYSQL is recommended;
Requirements	(1) Constructing the classification according to any existing methods, e.g. Bayesian classifier or C 4.5; (2) Creating Schema mapping among the schemas in the same category;
Specification	(1) Forming Category: classify the sample schemas according to existing methods (Bayesian classifier or C 4.5), and construct the hierarchical classification, like Yahoo; (2) Category organization: The categories in the hierarchical classification is maintained by one super peer, containing its classified rules, its parent category, first child category, next sibling category, and thus it can find any other categories; (3) Building schema mapping: Creating the schema mapping among the schemas in the same category, that is, which relation maps which other relation, and which attributes map which other attributes. The local peer must maintain the mapping information, so that, the peer can reformulate a query according to them, while super peer maintain which schema is classified into certain category. (4) Add schema into category: for a new schema, classified it into one of the existing category, or create a new category if there is no category related to it. (5) Query reformulation: for a new query from any peer, the posed peer reformulates it according to its schema mapping information, and thus its relevant peer can understand and thus answer it. (6) Query processing: when a peer receives a reformulate query, it reformulates the query according to its own schema, answers the reformulated query, and finally returns the answer to the posed peer. (7) Result visualization: the posed peer integrates all the answers from its relevant peers and visualizes the results;
Package Name	sg.edu.nus.sm

Name	Relation Data Management Module
Goal	Providing the facility for client peers to share schema of local small database to other peers, including all interfaces to operate relation database when insert, delete, update data and perform IR-based retrieval
Precondition	(1) A client user has installed a relation database, e.g., MySQL, at the client peer side; (2) The current version of the Best-Peer project;
Requirements	The interface used to export schema should be well defined;
Specification	(1) Displaying schema: access the backend database system and visualize all fields and their necessary information (e.g., data type) to users; (2) Selecting fields to be exported: users can determine which schema and which fields to be shared by toggling corresponding item, based on the visualization of the database; (3) Meta-data management: all meta-data of the shared table and fields shared should be stored in a system-defined file with system specified data structure; (4) Transferring exported schema to super peer: upload all meta-data of the shared schema and fields to super peer; super peer then indexes the meta-data with an inverted index whose structure is <Field, Data Type, Description, Table, Peer ID>. Notice that the BATON protocol will relay some meta-data to the corresponding super peers in terms of “Field” value and this is implemented by BATON; (5) Processing SQL with exported schema: given a simple SQL query, the client peer first executes the query locally and then sends the query to super peer. After the super peer receives the query, it will use the index with relation schema to determine which schema may contain the desired data for the query. If having match, then the schema information will be passed to the client peer for displaying. According to the returned schema, the user determine which tables (fields) should be queried and what kind of operation will be executed (e.g., join); Finally, the query will be sent to the corresponding peers and the results will be returned to visualize; (6) Data Management: If user determines to catch the query results, all results should be cached to a temporary table with the query schema. The content of the temporary table is not shared with other peers; All shared relation data at local side cannot be updated or deleted by remote users;
Package Name	sg.edu.nus.db

Name	Index Management Module
Goal	Providing the facility for super peers to store and retrieval the index at the super peer side; the index is organized as the inverted index; the content to be indexed are coming from all shared data from all client peers that belong to the corresponding super peer
Precondition	(1) The local files are indexed by Lucene; (2) The current version of the Best-Peer project;
Requirements	(1) Synchronization is very important for this part. So the code must be well design for this requirement with lock if necessary; (2) The interfaces used to manage index should be well defined;
Specification	(1) Partitioning index range: giving an option to user to specify how many index terms will be stored in a separated index partition; in other words, to improve the system performance and reduce the lock time on the index, we suggest to divide the whole index range into several non-overlap segments (for example, [a-c], [c-h], [h-q] and [r-z]) and each segment is stored in a separated file; (2) Insert index items: when a super peer receives new index items from its client peer, it will open the corresponding index partition, insert the index items and finally force the change to disk; (3) Update index items: when a client peer updates its local index, the new index items will be uploaded to its super peer. The super peer should open the corresponding index partition, update old items by new ones; (4) Remove index items: when a client peer deletes index items, the super peer’s index should be updated simultaneously; (5) Load index: when processing keywords query, the corresponding index partition should be loaded from disk to memory, if necessary. System should provide the interface for loading all index partition and for loading a subset of index partition according keywords set;
Package Name	sg.edu.nus.store
Memo	Refer to the source code of Lucene 1.9.1 (at site of www.apache.org)

Reference

1. Sun Microsystem. Code Conventions for the Java Programming Language. http://java.sun.com/docs/codeconv/, 1999.

2. Sun Microsystem. How to Write Doc Comments for the Javadoc Tool. http://java.sun.com/j2se/javadoc/writingdoccomments/, 2000.

3. Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison Wesley Inc., 1994.

4. Diomidis Spinellis. Code Reading: The Open Source Perspective. Addison Wesley Inc., 2003.