3: RANK AGGREGATION
Ravi Kumar Yahoo! Research Sunnyvale, CA ravikumar@
[email protected]
May 29, 2008
University of Rome
1
Outline of lecture
May 29, 2008
Metasearch problem and rank aggregation Voting and social choice KemenyKemeny-optimal/optimal/-approximate aggregation Simple voting algorithms Median rank aggregation and implications Improved algorithms Heuristics and results Other approaches to metasearch Distance metrics for IR applications University of Rome
2
Metasearch
For a given query, combine the results from different search engines
May 29, 2008
University of Rome
3
Why metasearch? metasearch?
May 29, 2008
Coverage: Search engines may not overlap much Consensus ranking: Get the best out of several ranking heuristics Spam resistance: Hard to fool many search engines Query robustness: Work for both broadbroad-topic and specific queries Feedback: Reflects the effectiveness of a particular search engine
University of Rome
4
Combining ranking functions Links Anchor text
Page title
Aggregate ranking URL Last modified date May 29, 2008
University of Rome
Text
5
Similarity search in databases Given collection of n database elements (each is a d-tuple of attributes) and given at runrun-time a query element q (another d-tuple of attributes) find the database element that best q matches q
1 Each of the d attributes is a voter Database elements = candidates n Each voter ranks all candidates d Database elements ranked by voter i, based on similarity to the query q in attribute i Find top winners of this election by aggregation
May 29, 2008
University of Rome
6
Basic theme: Rank aggregation Input: n candidates and k voters Preferential voting: Each voter gives a (partial) list of the candidates in order of preference 1 3 7
3 19 n … n 10
… … …
10 17 1
…
Goal: Produce a good consensus ordering of all n candidates Deja vu: Voting/elections May 29, 2008
University of Rome
7
Voting
Political decision making, jury decisions, pooling expert opinions, …
More than balance subjective opinions Seek the truth Find the “best best” best”, best candidate, second “best best , … What is “best best”? best ? Majority opinion represents (objectively) best?
May 29, 2008
University of Rome
8
Voting in CS: Some scenarios
May 29, 2008
MetaMeta-search Aggregating ranking functions in search engines Comparing search engine quality Spam reduction NearestNearest-neighbor and similarity search MultiMulti-criteria selection (eg (eg, eg, travel, restaurant) Word association techniques (AND queries)
University of Rome
9
CS vs SC
May 29, 2008
Small number of voters Large number of candidates Algorithmic efficiency Input could be partial lists/top k lists Output might have to be a ranking
University of Rome
10
Desiderata (CS)
May 29, 2008
Simple algorithm Fast algorithm (near(near-linear time) Provable quality of solution If approximation, factor should be independent of number of candidates/voters
University of Rome
11
Borda’s Borda s proposal (1770) Election by order of merit
JeanJean-Charles Borda
May 29, 2008
First place is worth 1 point, second place is worth 2 points ... Candidate’s Candidate s score = Sum of points Borda winner: Lowest scoring candidate Eg, Eg, MVP in MLB University of Rome
12
Condorcet’s Condorcet s proposal (1785) Partition candidates into A, B If for every a ∈ A and b ∈ B, majority ranks a ahead of b then aggregation must place all elements in A ahead of all elements in B
Marie J. A. N. Caritat, Caritat, Marquis de Condorcet May 29, 2008
Condorcet winner: A candidate who defeats every other candidate in pairwise majoritymajority-rule election University of Rome
13
Condorcet ≠ Borda (6) A B C
B
A
(4) B C A
C
Borda scores: A (1*6 + 3*4 = 18), B (2*6 + 1*4 = 16), C = (3*6 + 2*4 = 26) B is the Borda winner Condorcet criterion: A beat both B and C in pairpair-wise majority A is the Condorcet winner
May 29, 2008
University of Rome
14
Condorcet paradox A B C
B C A
C A B
B
A C
Condorcet winner may not exist! Black (1950s): Choose Condorcet winner; if none, choose Borda winner Copeland (1951): Choose candidate with highest outdegree – indegree in the majority graph May 29, 2008
University of Rome
15
Many other voting schemes
Plurality vote
Instant runoff vote
If there is a majority winner, choose Otherwise, eliminate least popular, repeat President of Ireland, Australian parliament, many US university student elections
SingleSingle-transferable vote
Candidate with most # first positions is winner
Malta, Republic of Ireland, Australian Senate
…
May 29, 2008
University of Rome
16
Arrow’s Arrow s theorem (1951) The following are irreconcilable Every result must be achievable somehow Monotonicity: Monotonicity: Ranking higher should not hurt a candidate Independence of irrelevant attributes: Changes in rankings of “irrelevant irrelevant alternatives” alternatives should have no impact on ranking of “relevant relevant” relevant subset NonNon-dictatorship
Conclusion: satisfactory rank aggregation function May 29, 2008
University of Rome
17
Borda vs. Condorcet debate
Borda
ScoreScore-based Consistent: two separate set of voters yield same ranking ⇒ their union yields same ranking Theorem: Any scorescore-based method not Condorcet
Condorcet
May 29, 2008
MajorityMajority-based Meet Arrow’s independence of irrelevant Arrow s criteria where “independence attributes” attributes criterion is modified Winner may not exist University of Rome
18
Kemeny’s Kemeny s proposal (1959) Axiomatic approach
“Distance Distance” Distance between two preference orderings Distance = number of pairpair-wise disagreements Obtain ordering that is “least leastleast-distant” distant from the individual orderings
Theorem [Young Levenglick 1988]: Kemeny’s Kemeny s rule is the unique preference function that is neutral, consistent, and Condorcet
May 29, 2008
Reconciles Borda and Condorcet Satisfies additional properties (Pareto, anonymity) Maximum likelihood interpretation: [Young 1988]
University of Rome
19
Metrics on permutations
Domain: [n] = { 1, 2, …,, n } σ ∈ Sn σ(i) < σ(j) means that “σ σ ranks i above j” j
Kendall τ distance Spearman’s Spearman s footrule distance
May 29, 2008
University of Rome
20
Kendall τ distance K(σ K(σ, τ) = Number of pairs (i, j) such that σ ranks (i, j) in one order and τ ranks them in the opposite order
BubbleBubble-sort distance K is a metric K is right invariant: K(σ K(σ, τ) = K(σ K(σ τ-1, 1) Eg A B C D
May 29, 2008
B D A C
number of disagreements: 3 (AB, AD, CD) University of Rome
21
Spearman’s Spearman s footrule distance F(σ σ(i) – τ(i)| F(σ, τ) = ∑i = 1, n |σ (i)
F is a metric (L1-norm) F is right invariant: F(σ F(σ, τ) = F(σ F(σ τ-1, 1) Eg, Eg, A B C D
May 29, 2008
B D A C
shift(A) shift(A) = 2 shift(B) shift(B) = 1, etc, so footrule distance: 6
University of Rome
22
There are several others, but… but Many of the other metrics are computationally expensive (some NPNP-hard, some not known to be polynomialpolynomial-time computable, etc.) [Diaconis; Diaconis; Group Representation in Probability and Statistics]
Also these two are perhaps the most natural for many applications
May 29, 2008
University of Rome
23
DiaconisDiaconis-Graham inequality K(σ K(σ, τ) ≤ F(σ F(σ, τ) ≤ 2 K(σ K(σ, τ)
May 29, 2008
University of Rome
24
F(σ F(σ)
≤ 2 K(σ K(σ)
F(σ σ(i) – i| F(σ) = ∑i |σ = ∑i | ∑j [σ(i) > σ(j)] – [i > j] |
≤ ∑i ∑j |[[σ(i) > σ(j)] – [i > j] | = ∑i, j [σ(i) > σ(j), i < j] = 2 K(σ K(σ)
May 29, 2008
University of Rome
25
K(σ K(σ)
[i: j] = inversion i < j, σ(i) > σ(j)
≤ F(σ F(σ)
Type 1 inversion if σ(i) ≥ j ⇒ i < j ≤ σ(i) ⇒ ∀ i, #{ # j | [i; j] is type 1 inversion } ≤ σ(i) – i Type 2 inversion if σ(i) ≤ j ⇒ σ(j) < σ(i) ≤ j ⇒ ∀ j, #{ # i | [i; j] is type 2 inversion } ≤ j – σ(j σ(j)
Each inversion is type 1, or type 2, or both
K(σ K(σ)
May 29, 2008
≤ type 1 inversions + type 2 inversions ≤ ∑i | σ(i)> i (σ(i) – i) + ∑j | j > σ(j) (j – σ(j)) ≤ F(σ F(σ) University of Rome
26
Optimal aggregation Given metric d(⋅ d(⋅,⋅) and input permutations σ1, …,, σk, find permutation π∗ such that ∑i = 1, k d(σ d(σi, π∗) is minimized Kemeny (Kendall) optimal aggregation: d = K Spearman footrule optimal aggregation: d = F
May 29, 2008
University of Rome
27
Kemeny optimal aggregation Theorem [Bartholdi Tovey Trick 1989]: Kemeny optimal aggregation is NPNP-hard Theorem: Kemeny optimal aggregation is NPNPhard even for 4 lists
May 29, 2008
Reduction using feedback arc set
University of Rome
28
c-approximate aggregation Given metric d(⋅ d(⋅,⋅) and input permutations σ1, …,, σk, find permutation π such that ∑i = 1, k d(σ d(σi, π) ≤ c ⋅ ∑i = 1, k d(σ d(σi, π∗)
May 29, 2008
University of Rome
29
Trivial approximation Theorem: 2(1 – 1/k)1/k)-approximation can be computed easily Proof: K, F are metrics and simple geometry π* = Optimal aggregation wrt. wrt. d(⋅ d(⋅,⋅) i* = arg mini ∑j d(σ d(σi, σj) ∑j d(σ d(σj, σi*) ≤ (1/k) ∑j, j’j d(σ d(σj, σj’) ≤ (1/k) ∑j, j’j (d(σ d(σj, π*) + d(π d(π*, σj’)) ≤ 2 ∑j d(σ d(σj, π*)
May 29, 2008
University of Rome
30
Footrule optimal aggregation Theorem [DKNS]: FF-optimal aggregation can be computed in polynomial time Proof: Via minimum cost perfect matching Elements
1
1
Positions
a p n
May 29, 2008
n University of Rome
∑i = 1, k |σ σi(a) (a) – p| 31
2-approximation to KK-optimum Use DiaconisDiaconis-Graham inequality π = Footrule optimal aggregation π* = KendallKendall-optimal aggregation ∑i K(σ K(σi, π) ≤ ∑i F(σ F(σi, π) ≤ ∑i F(σ F(σi, π*) ≤ 2 ∑i K(σ K(σi, π*)
May 29, 2008
University of Rome
32
Heuristic: Median rank aggregation Given σ1, …,, σk, µ’(i) (i) = median (σ (σ1(i), …,, σk(i)) (i)) Order µ’ to obtain a permutation µ Eg, Eg, A B C D
B D A C
C D B A
µ’(A) (A) = 3, µ’(B) (B) = 2, µ’(C) (C) = 3, µ’(D) (D) = 2 µ=B D A C Median ranking is used in Olympic figure skating May 29, 2008
University of Rome
33
Median rank aggregation Theorem [DKNS]: If the median ranks of the candidates are unique (ie (ie, ie, form a permutation), then this permutation is a footrule optimal aggregation
What about using the median itself for ranking, even if it is not unique?
May 29, 2008
University of Rome
34
Median is a good approximation Theorem [FKMSV]: Median rank aggregation is a 3-approximation to footrule optimal aggregation
May 29, 2008
University of Rome
35
Consistent permutations Given σ’ = σ’1, …,, σ’n where σ’i ∈ R, call a permutation σ ∈ Sn to be consistent with σ’ if σ’i < σ’j ⇒ σ(i) < σ(j) Consistency lemma: If σ is consistent with σ’,, then for any other permutation τ, F(σ F(σ, σ’)) ≤ F(τ F(τ, σ’))
May 29, 2008
University of Rome
36
Proof of consistency lemma Fact: a’ a ≤ b’ and a < b ⇒ |aa – a’| + |b b – b’| ≤ |aa – b’| + |aa’ – b| If τ ≠ σ, apply this fact repeatedly to differing pairs until τ becomes σ Each time F(τ F(τ, σ’)) can only improve
May 29, 2008
University of Rome
37
Median lemma Fact: Given x1, …,, xn where xi ∈ R, median(x1, …,, xn) = arg miny ∑i |xxi – y| Median lemma: Given permutations σ1, …,, σk, let µ’ denote their median function. Then, for any permutation τ, ∑i F(µ F(µ’,, σi) ≤ ∑i F(τ F(τ, σi)
May 29, 2008
University of Rome
38
Proof of median theorem Let τ be any permutation ∑i F(µ F(µ, σi) ≤ ∑i F(µ F(µ, µ’)) + ∑i F(µ F(µ’,, σi) (triangle) ≤ ∑i F(τ F(τ, µ’)) + ∑i F(µ F(µ’,, σi) (consistency) ≤ ∑i F(τ F(τ, σi)+ 2 ∑i F(µ F(µ’,, σi) (triangle) ≤ ∑i F(τ F(τ, σi)+ 2 ∑i F(τ F(τ, σi) (median) = 3 ∑i F(τ F(τ, σi)
May 29, 2008
University of Rome
39
Merits of median
May 29, 2008
Simple to implement Admits instance optimal algorithms [FLN]: among all algorithms that do sequential and random access to prepre-sorted preference orders, the runrun-time of this medianmedian-finding algorithm is optimal up to a factor of 2 on every instance A good method for nearestnearest-neighbor applications University of Rome
40
Borda rank aggregation Given σ1, …,, σk, β’(i (i) (i) (i) = σ1(i) + L + σk(i) Order β’ to obtain a permutation β Eg, Eg, A B C D
B D A C
C D B A
β’(A (A) (B) (C) (D) (A) = 8, β’(B (B) = 6, β’(C (C) = 8, β’(D (D) = 8 β=B A C D
May 29, 2008
University of Rome
41
Borda is a good approximation Theorem [FKMSV]: Borda rank aggregation is a 5-approximation to footrule optimal aggregation Borda lemma: ∑i F(β F(β’,, σi) ≤ 2 ∑i F(µ F(µ’,, σi) Prove this pointpoint-wise for every j in the domain
May 29, 2008
University of Rome
42
Proof of Borda lemma ∑i |β β’(j) (j) - σi(j) (j)| = ∑i |(1/k (1/k ∑i’ σi’(j)) (j)) – σi(j) (j) | = (1/k) ∑i |∑ ∑i’(σi’(j) (j) – σi(j)) (j))| )) ≤ (1/k) ∑i, i’i | σi’(j) (j) – σi(j) (j) | ≤ (1/k) ∑i, i’i (| σi’(j) (j) – µ(j) (j) | + | σi(j) (j) – µ(j) (j) |)) = 2 ∑i | σi(j) (j) – µ(j) |
May 29, 2008
University of Rome
43
Proof of Borda theorem Let τ be any permutation ∑i F(β F(β, σi) ≤ ∑i F(β F(β, β’)) + ∑i F(β F(β’,, σi) (triangle) ≤ ∑i F(τ F(τ, β’)) + ∑i F(β F(β’,, σi) (consistency) ≤ ∑i F(τ F(τ, σi)+ 2 ∑i F(β F(β’,, σi) (triangle) ≤ ∑i F(τ F(τ, σi)+ 2 ∑i F(µ F(µ’,, σi) (Borda) ≤ ∑i F(τ F(τ, σi)+ 4 ∑i F(τ F(τ, σi) (median) = 5 ∑i F(τ F(τ, σi)
May 29, 2008
University of Rome
44
Copeland rank aggregation Given σ1, …,, σk, Γ(i, (i, j) = majority { σ1(i) vs σ1(j), …,, σk(i) (i) vs. σk(j) (j) } γ’(i) (i) = ∑i Γ(i, j) – ∑j Γ(j, i) B Order γ’ to obtain a permutation γ Eg, Eg, A B C D
B D A C
C D B A
A
C
γ’(A (A) (B) (C) (D) (A) = -1, γ’(B (B) = 3, γ’(C (C) = -1, γ’(D (D) = -1 γ=B A C D D
May 29, 2008
University of Rome
45
Copeland is a good approximation Theorem [FKMSV]: Copeland rank aggregation is a 66-approximation to Kendall optimal aggregation Proof: As before, but using K instead of F
May 29, 2008
University of Rome
46
Plurality method Given σ1, …,, σk, π’(i (i) (i) = 〈 …,, # j-th place votes, … 〉 Lexicographically order π’ to obtain a permutation π Eg, Eg, A B C D
B D A C
C D B A
π’(A (A) (B) (A) = 〈 1 0 1 1 〉, π’(B (B) = 〈 1 1 1 0 〉, π’(C (C) (D) (C) = 〈 1 0 1 1 〉, π’(D (D) = 〈 0 2 0 1 〉 π=B A C D
May 29, 2008
University of Rome
47
Plurality is not a good approximation Theorem [FKMSV]: Plurality rank aggregation is not a good to approximation to Kendall optimal aggregation Proof: n candidates, k voters, n >> k 1 1 2 3 4 … k-1 2 2 3 2 2… 2 3 3 4 4 3… 3 … n n 1 1 1… 1
May 29, 2008
π=12…n
∑i F(π F(π, σi) ≥ (k(k-2)(n2)(n-1) β = 2 3 … n 1 ∑i F(β F(β, σi) ≤ k3 + n n ↑ ⇒ Ratio = Ω(k)
University of Rome
48
Rank agg vs min feedback arc set
Min feedback arc set (FAS): Find smallest E’ E ⊆ E such that graph (V, E – E’)) is acyclic
V = candidates (i, j) ∈ E = fraction of voters who rank i above j Acyclic = linear ordering
May 29, 2008
G = (V, E), directed Tournament: each pair (i, j) has an edge
University of Rome
49
New approximation algorithm Theorem [Ailon, Charikar, Newman]: There is a 11/711/7approximation algorithm for rank aggregation Proof: Consists of combining two approximation algorithms 1. Pick input closest to all other inputs (2(2-approx) 2. Construct a tournament and approximate FAS on a weighted tournament (2(2-approx) 3. Take the best of both solutions
May 29, 2008
University of Rome
50
FAS on unweighted tournaments RQS(V, E) Pick a node u ∈ V at random V1 = { v | (v, u) in E }, E1 = edges in V1 V2 = { v | (u, v) in E }, E2 = edges in V2 Output RQS(V1, E1) ° u ° RQS(V2, E2) Randomized quicksort! quicksort! Theorem [ACN]: RQS is 33-approximation algorithm May 29, 2008
University of Rome
51
The final word on this … Theorem [Kenyon[Kenyon-Mathieu, Schudy]: Schudy]: There is a PTAS for rank aggregation
May 29, 2008
University of Rome
52
Heuristics: Markov chains
May 29, 2008
States = candidates Transitions = function of voting preferencess Probabilistically switch to a better candidate Final ranking = order of stationary probabilities
University of Rome
53
Advantages of Markov chains
May 29, 2008
Handling partial lists and top k lists using available information to infer new ones Handling uneven comparisons and list lengths Motivation from PageRank--PageRank---more ---more wins better, more wins against good players even better With O(nk) O(nk) preprocessing, O(k) O(k) per step for about O(n) O(n) steps
University of Rome
54
Sample Markov chains If current state is candidate P, next state is: MC1: Choose uniformly from the multiset of all candidates that were ranked higher than or equal to P by some voter that ranked P … MC4: Choose unifomly a candidate Q from all candidates and switch if the majority preferred Q to P
May 29, 2008
University of Rome
55
Metasearch results
May 29, 2008
Using top 100 from major search engines Queries: affirmative action, alcoholism, …
University of Rome
K
F
Borda
0.214
0.345
Footrule
0.111
0.167
MC1
0.130
0.213
MC2
0.128
0.210
MC3
0.114
0.183
MC4
0.104
0.149 56
Other approaches to Metasearch
Support vector machines [Joachims 2002] Learning [Cohen Schapire Singer 1999]
Condorcet fusion
May 29, 2008
Hedge algorithm—iterative algorithm iterative weight update [Montague Aslam 2002]
Finding Hamiltonian paths in Condorcet graphs
Bayesian
[Aslam Montague 2001]
University of Rome
57
Equivalent distant measures Distance measure = nonnon-negative, symmetric, regular binary function Two distance measures d(⋅ d ⋅, ⋅) and D(⋅ D(⋅, ⋅) are equivalent if there is a constant b > 0 such that forall x, y in domain, D(x, D(x, y) ≤ d(x, d(x, y) ≤ b ⋅ D(x, D(x, y) Theorem [FKS]: If d is a metric, then D satisfies approximate triangle inequality Theorem [FKS]: If there is a factor cc-aggregation algorithm with respect to d, then there is a factor (cb (cb) cb)-aggregation algorithm with respect to D
May 29, 2008
University of Rome
58
Comparing top k lists in IR Define distance measures to compare top k lists Useful in ranking search engine results, … τ1, τ2 = top k lists, D = dom τ1 U dom τ2 dmin(τ1, τ2) = minσ ≥ τ , σ ≥ τ , σ , τ ∈ S|DD| { d(σ σ1, σ2) } 1 1 2 2 1 1 davg(τ1, τ2) = avg … dhaus(τ1, τ2) = max { maxσ minσ … } 1 2 Theorem [FKS]: The distance measures Kmin, Kavg, Khaus, Fmin, Favg, Fhaus, are all in the same equivalence class
May 29, 2008
University of Rome
59
Comparing bucket orders Bucket order = Linear order with “ties ties” ties ProfileProfile-based measures K-profile(σ profile(σ): pi, j = 1 if σ(i) < σ(j), 0 if σ(i) = σ(j), -1 o.w. o.w. F-profile(σ profile(σ): pi = average position of i in its bucket dprof = L1 distance between dd-profiles Hausdorff measures as before Theorem [FKMSV]: The distance measures Kprof, Khaus, Fprof, Fhaus can be computed in polynomial time Theorem [FKMSV]: The distance measures Kprof, Khaus, Fprof, Fhaus, are all in the same equivalence class
May 29, 2008
University of Rome
60
Some open questions
Improve the constants for median, Borda Is rank aggregation NPNP-hard for three lists? Aggregating wrt other metrics on permutations
May 29, 2008
Borda is nearnear-optimal wrt Spearman’s Spearman s rho
Practical algorithms for PTAS
University of Rome
61
Some references
May 29, 2008
Dwork, Kumar, Naor, Sivakumar. Rank aggregation methods for the web, WWW, 2001. Fagin, Kumar, Sivakumar. Comparing toptop-k lists, SODA, 2003. Fagin, Kumar, Mahdian, Sivakumar, Vee. Comparing partial rankings, PODS, 2004. Ailon, Charikar, Newman. Aggregating incosistent information: Ranking and clustering, STOC, 2005. KenyonKenyon-Mathieu, Schudy. Schudy. How to rank with few errors: A PTAS for weighted feedback arc set on tournaments, STOC, 2007. University of Rome
62
Thank you all!
ravikumar@
[email protected]
May 29, 2008
University of Rome
63