Results and Visualizations

By adjusting the probability threshold, the number of nodes per cluster can be varied. We have found that the value of $10^{-4}$ is optimal for visualization purposes. The cluster sizes were on the order of $500-1000$ with. This is an amount of information that can be interpreted by a human via visualization. On the following picture the seed nodes are marked with large black circles. The seed nodes are surrounded by color-coded nodes. Each color corresponds to a different clusters. The inter-cluster edges are drawn with black, the intra-cluster edges are invisible. Depending on your SVG viewer the nodes can be clicked on and you will be taken to the corresponding Citeseer website.

There are many interconnections between the yellow cluster and the pink cluster. The seed paper in the yellow cluster treats about wavelets and wavelets is a common tool used in computer graphics (the pink cluster). Another observation that we can make is that a certain number of nodes are very highly connected and link to many different clusters.

The computational complexity of the algorithm is $O(t*m*i)$, where $t$ is the number of seeds, $i$ is the number of iterations of the eigenvector algorithm and $m$ is the number of edges in the graph. Uniformly random walk is more efficient than the algorithm that in [15], with complexity $O(n^2\log(n))$, but does not guarantee contiguity.

Ali Salehi 2005-04-29