Citation Network Partitioning

Khaled Bachour, Eda Baykan, Wojciech Galuba, Ali Salehi

Abstract:

We consider a problem of clustering publications of the citation network given seed papers as an input. The algorithm in its most general form is NP-hard. Given the Citeseer citation data, we solve the problem by using the randomized contraction heuristic and provide an analytical bounds on the number of necessary runs. In addition we experimentally demonstrate the scale-free and small-world properties of the network. We propose two heuristics for solving the clustering problem based on phenomena occuring in complex networks: epidemic spreading and random walks.





Ali Salehi 2005-04-29