Problem II - The Citation Graph

The problem involves the analysis of the citation graph in computer science. A paper is a node in the graph and every citation is a directed edge from the citing paper to the cited paper. As data source we use citeseer. You should download the citeseer archive. This archive contains most of the records but not all of them. Each group should devise its own way of completing the information. As a warm up, you are expected to find the shortest (undirected) path between the two papers (Bryant, 311874) and (RSA, 28289).

The main part of the project consists of dividing the graph into 16 clusters. Every cluster is to be built around a highly cited paper in some area of computer science. You should build the clusters so that the number of cross edges is minimal. The idea behind this project is that we hope that by minimizing the number of cross edges, the clusters would relate to the different areas of computer science.

If you have ideas as to other interesting questions that you would like to investigate regarding this graph, you are welcome to raise them.