All the algorithms presented here could be modified to use edge weights
between the papers. The weights can be generated using the information
about the papers such as authors, co-citation, similarity measures based
on information retrieval techniques (e.g. the vector model). In a similar
manner, these methods can also be used to add new edges to the graph,
especially to the poorly connected nodes, and to remove edges from
highly interconnected nodes, so that only the most relevant links are
maintained in the graph. By flattening the degree distribution we could
potentially achieve a better performance of the algorithms.
Another data source that could be used are the logs of page visits on the Citeseer website. HTTP sessions could be viewed as walks on the citation graph and edges most often traversed could provide vital hints about the relationships between the papers.