Graph coherence issue
Group 1 (presented by Grégory T.)
They checked for coherence of Citeseer databases: Web-based and
They propose an explanation verified by numerous examples in several
fields (about 30 papers among three fields of research): context and
papers id seem to be messed up. Indeed, if a document A (id:3311)
references a paper B in the HTML version, the paper B is actually
referenced by A since this version is coherent. On the other hand, in
the OAI version, the paper B will be referenced by the context with id
3311, that is part of another field.
- Web-based: pretty accurate in representing reality, both ref
links and isRefBy links.
- OAI: isRefBy links are (almost) always wrong while ref links are
Moreover, they noticed that most of papers have no references at all.
They provide plots as shown below.
Group 2 (presented by Christophe)
- They corroborate the findings of Group 1.
- They note that the graph is very sparse and that XML files
contain only links to papers contained in Citeseer (dark blue links in
Citations paragraph in HTML version)
Group 3 (presented by Ali)
They focus on the partitioning algorithm problem.
Graph partitioning issue
- Contexts are useful for clustering
- But we are going to use a graph using only references.
- Marc proposes to contact Citeseer to make them aware of this issue
They focus only on the graph coherence
Group 2 (presented by Alex)
- There is one big strongly connected components (SCC) in the graph
with references only.
Weak Connected Components (WCC)
Group 3 (presented by ...)
They tried to explore data structures available for the algorithm. They
tried linked lists with a random algorithm, but they got very poor
performance: a single node merging taking more than 1 second. Tom
suggests hash tables instead.
Group 2 (presented by Eda)
Eda contacted several guys who worked on this problem:
She finds also new publications with interesting results and some
useful libraries for this problem:
- David Karger (MIT): has an implementation, but he makes use of a
- Other guys who have no implementation of their algorithm [2,3,4]
- Metis library
- Ledas Library
- CPLEX (linear programming)
Directed or undirected graph doesn't matter. It could
lead to some differences only in a case of two papers citing each
other. We should not focus on this problem since a precise algorithm on
a modified graph may achieve better results than an heuristic on the
Finally, he announces that this
project is due for April 28 since we are a bit late.
 Rounding Algorithms for a Geometric Embedding of Minimum
multiway cut. Kager and al. 1999 (1.3438 performance ratio)
 An Improved Approximation Algorithm for multiway cut. Calinescu and
al. 2000 (1.5 -1/k performance ratio)
 A 2-Approximation Alg. for the directed Multiway Cut Problem. Naor
and al. (approximation factor of 2)
 Multiway Cuts in Directed Graphs and Node Weighted Graphs. Garg and
al. (2 log(k) performance ratio)