Problem II  The Citation Graph
The problem involves the analysis of the citation graph in computer
science. A paper is a node in the graph and every citation is a
directed edge from the citing paper to the cited paper. As data source
we use citeseer. You should download
the citeseer
archive. This archive contains most of the records but not all of
them. Each group should devise its own way of completing the
information.
As a warm up, you are
expected to find the shortest (undirected) path between the two papers (Bryant, 311874) and
(RSA, 28289).
The main part of the project consists of dividing the graph into
16 clusters. Every cluster is to be built around a highly cited paper in
some area of computer science. You should build the clusters so that
the number of cross edges is minimal. The idea behind this project is
that we hope that by minimizing the number of cross edges, the clusters
would relate to the different areas of computer science.
If you have ideas as to other interesting questions that you would like
to investigate regarding this graph, you are welcome to raise them.
 Here is Perl program that interacts with citeseer: cite.

The following papers from citeseer's most cited documents are the seeds ("anchor nodes")
of the 16 clusters: