The choice of data structure representation of the graph has the most significant impact on efficiency of Randomized Min-Cut Algorithm. We tried three approaches: adjacency list, binary decision tree and hash tables.
In our first implementation, we use an adjacency list representation of the graph using singly linked lists for individual edge lists. Though not the wisest choice when the goal is efficiency, it is what we needed to quickly get a feel of the scale of the computational requirements at hand. With as number of nodes, as number of edges, and as the average node degree, the cost of the resulting implementation is on the order of . The estimated runtime is over 20 hours for a reduced version of the graph where context information is not included.
In the second version, we replace linked lists with binary decision trees using the java.util.TreeSet class. The theoratical complexity of the implementation is thus reduced to . A good improvement considering is alot larger than and , but not good enough. A single run of the algorithm on the reduced graph exceeds a few hours.
We then try hash tables using the built-in java.util.HashMap. Since the hash table is used to store the edge information, the result is constant time edge operations. Algorithmic complexity is dropped to , and with it running time is dropped to around 40 minutes for the reduced graph.
For the full graph, we face the different problem of memory utilization. The memory consumption of the three data structures we used increases with the efficiency of the data structure: linked lists consume manageable amounts of memory, trees use more, and hash tables cause swapping to occur, rendering the implementation virtually useless. At this point however, the server at MTC (mtcserver.epfl.ch) has become available to us, and since it has alot more memory than our own machines, the problem is solved. We modify our code, anyway, to use the implementation of hash tables found in the GNU Trove package which is more efficient than the built-in Java HashMap in both memory consumption and speed .
The final implementation, run on the MTC server, produces a partition for the full graph in around 2 hours.