Results and Evaluation

The results all exhibit the same phenomenon: one seed node gathers most other nodes into its cluster, ending with one big partition and 12 other small ones. Around 500,000 nodes are joined in one cluster which differs across different runs of the algorithm. The sizes of the other clusters range from 1 to 11,000. There are around 400,000 isolated nodes that remain unclustered because they do not have a path to any of the seed nodes, and hence no choice of edge contractions would lead them to a cluster. The resulting cut has a min-cut value of around 330,000, and a normalized value of 940.

These unbalanced results are due to a feedback effect through which larger clusters tend to grow even larger. Hence the cluster that grows most in the first few iteration, ends up dominating for the remainder of the run. Some seed nodes are more likely to dominate than others, depending on their degree. Seed node 2643 (Numerical Analysis) does not recruit any node into its cluster in every run of the algorithm. Upon examining the edge list. we see this is due to the fact that seed node 2643 actually has a degree of 1.

To lessen the effects of feedback, we introduce changes into the algorithm by giving less weight to joined edges. For two edges of weights $x$ and $y$, the edge resulting from merging these two edges has a weight of $\sqrt{x^2 + y^2}$ instead of the standard $x+y$. This change leads to some better solutions. Min-cut value is reduced to 180,000 with a normalized value of 300.

The problem with this approach though is that, due to the fact that the implementation uses integer weight values, in the first contractions, two edges of weight 1 mergerd into an edge of weight 2 rather than 1.4, thus limiting the effects of the reduced edge weights, which ideally are more important early on in the algorithm.

We also try a variant of the algorithm where edge weights do not accumulate. This constant weight variant performed better than the pervious versions, but only in terms of producing a balanced partition. The resulting partition had more edges in the min cut but produced a better normalized min cut because of the balanced sizes of the clusters.

Algorithm Min-Cut Value Normalized Min-Cut
Randomized Min-Cut 330906 943.5
Randomized Min-Cut with Reduced Weights 183229 314.4
Randomized Min-Cut with Constant Weights 684837 115.2

Ali Salehi 2005-04-29