rank aggregation

1 downloads 0 Views 498KB Size Report
May 29, 2008 - Condorcet criterion: A beat both B and C in pair-wise ... Number of pairs (i, j) such that σ .... Median ranking is used in Olympic figure skating ...
3: RANK AGGREGATION

Ravi Kumar Yahoo! Research Sunnyvale, CA ravikumar@[email protected]

May 29, 2008

University of Rome

1

Outline of lecture         

May 29, 2008

Metasearch problem and rank aggregation Voting and social choice KemenyKemeny-optimal/optimal/-approximate aggregation Simple voting algorithms Median rank aggregation and implications Improved algorithms Heuristics and results Other approaches to metasearch Distance metrics for IR applications University of Rome

2

Metasearch

For a given query, combine the results from different search engines

May 29, 2008

University of Rome

3

Why metasearch? metasearch?  

 



May 29, 2008

Coverage: Search engines may not overlap much Consensus ranking: Get the best out of several ranking heuristics Spam resistance: Hard to fool many search engines Query robustness: Work for both broadbroad-topic and specific queries Feedback: Reflects the effectiveness of a particular search engine

University of Rome

4

Combining ranking functions Links Anchor text

Page title

Aggregate ranking URL Last modified date May 29, 2008

University of Rome

Text

5

Similarity search in databases Given collection of n database elements (each is a d-tuple of attributes) and given at runrun-time a query element q (another d-tuple of attributes) find the database element that best q matches q

1 Each of the d attributes is a voter Database elements = candidates n Each voter ranks all candidates d Database elements ranked by voter i, based on similarity to the query q in attribute i Find top winners of this election by aggregation

May 29, 2008

University of Rome

6

Basic theme: Rank aggregation Input: n candidates and k voters Preferential voting: Each voter gives a (partial) list of the candidates in order of preference 1 3 7

3 19 n … n 10

… … …

10 17 1



Goal: Produce a good consensus ordering of all n candidates Deja vu: Voting/elections May 29, 2008

University of Rome

7

Voting 

Political decision making, jury decisions, pooling expert opinions, …



More than balance subjective opinions Seek the truth Find the “best best” best”, best candidate, second “best best , … What is “best best”? best ? Majority opinion represents (objectively) best?

 

May 29, 2008

University of Rome

8

Voting in CS: Some scenarios       

May 29, 2008

MetaMeta-search Aggregating ranking functions in search engines Comparing search engine quality Spam reduction NearestNearest-neighbor and similarity search MultiMulti-criteria selection (eg (eg, eg, travel, restaurant) Word association techniques (AND queries)

University of Rome

9

CS vs SC     

May 29, 2008

Small number of voters Large number of candidates Algorithmic efficiency Input could be partial lists/top k lists Output might have to be a ranking

University of Rome

10

Desiderata (CS)    

May 29, 2008

Simple algorithm Fast algorithm (near(near-linear time) Provable quality of solution If approximation, factor should be independent of number of candidates/voters

University of Rome

11

Borda’s Borda s proposal (1770) Election by order of merit

JeanJean-Charles Borda

May 29, 2008

First place is worth 1 point, second place is worth 2 points ... Candidate’s Candidate s score = Sum of points Borda winner: Lowest scoring candidate Eg, Eg, MVP in MLB University of Rome

12

Condorcet’s Condorcet s proposal (1785) Partition candidates into A, B If for every a ∈ A and b ∈ B, majority ranks a ahead of b then aggregation must place all elements in A ahead of all elements in B

Marie J. A. N. Caritat, Caritat, Marquis de Condorcet May 29, 2008

Condorcet winner: A candidate who defeats every other candidate in pairwise majoritymajority-rule election University of Rome

13

Condorcet ≠ Borda (6) A B C

B

A

(4) B C A

C

Borda scores: A (1*6 + 3*4 = 18), B (2*6 + 1*4 = 16), C = (3*6 + 2*4 = 26) B is the Borda winner Condorcet criterion: A beat both B and C in pairpair-wise majority A is the Condorcet winner

May 29, 2008

University of Rome

14

Condorcet paradox A B C

B C A

C A B

B

A C

Condorcet winner may not exist! Black (1950s): Choose Condorcet winner; if none, choose Borda winner Copeland (1951): Choose candidate with highest outdegree – indegree in the majority graph May 29, 2008

University of Rome

15

Many other voting schemes 

Plurality vote 



Instant runoff vote   



If there is a majority winner, choose Otherwise, eliminate least popular, repeat President of Ireland, Australian parliament, many US university student elections

SingleSingle-transferable vote 



Candidate with most # first positions is winner

Malta, Republic of Ireland, Australian Senate



May 29, 2008

University of Rome

16

Arrow’s Arrow s theorem (1951) The following are irreconcilable  Every result must be achievable somehow  Monotonicity: Monotonicity: Ranking higher should not hurt a candidate  Independence of irrelevant attributes: Changes in rankings of “irrelevant irrelevant alternatives” alternatives should have no impact on ranking of “relevant relevant” relevant subset  NonNon-dictatorship

Conclusion:  satisfactory rank aggregation function May 29, 2008

University of Rome

17

Borda vs. Condorcet debate 

Borda   



ScoreScore-based Consistent: two separate set of voters yield same ranking ⇒ their union yields same ranking Theorem: Any scorescore-based method not Condorcet

Condorcet   

May 29, 2008

MajorityMajority-based Meet Arrow’s independence of irrelevant Arrow s criteria where “independence attributes” attributes criterion is modified Winner may not exist University of Rome

18

Kemeny’s Kemeny s proposal (1959) Axiomatic approach 



“Distance Distance” Distance between two preference orderings Distance = number of pairpair-wise disagreements Obtain ordering that is “least leastleast-distant” distant from the individual orderings

Theorem [Young Levenglick 1988]: Kemeny’s Kemeny s rule is the unique preference function that is neutral, consistent, and Condorcet   

May 29, 2008

Reconciles Borda and Condorcet Satisfies additional properties (Pareto, anonymity) Maximum likelihood interpretation: [Young 1988]

University of Rome

19

Metrics on permutations   

Domain: [n] = { 1, 2, …,, n } σ ∈ Sn σ(i) < σ(j) means that “σ σ ranks i above j” j

Kendall τ distance Spearman’s Spearman s footrule distance

May 29, 2008

University of Rome

20

Kendall τ distance K(σ K(σ, τ) = Number of pairs (i, j) such that σ ranks (i, j) in one order and τ ranks them in the opposite order    

BubbleBubble-sort distance K is a metric K is right invariant: K(σ K(σ, τ) = K(σ K(σ τ-1, 1) Eg A B C D

May 29, 2008

B D A C

number of disagreements: 3 (AB, AD, CD) University of Rome

21

Spearman’s Spearman s footrule distance F(σ σ(i) – τ(i)| F(σ, τ) = ∑i = 1, n |σ (i)   

F is a metric (L1-norm) F is right invariant: F(σ F(σ, τ) = F(σ F(σ τ-1, 1) Eg, Eg, A B C D

May 29, 2008

B D A C

shift(A) shift(A) = 2 shift(B) shift(B) = 1, etc, so footrule distance: 6

University of Rome

22

There are several others, but… but Many of the other metrics are computationally expensive (some NPNP-hard, some not known to be polynomialpolynomial-time computable, etc.) [Diaconis; Diaconis; Group Representation in Probability and Statistics]

Also these two are perhaps the most natural for many applications

May 29, 2008

University of Rome

23

DiaconisDiaconis-Graham inequality K(σ K(σ, τ) ≤ F(σ F(σ, τ) ≤ 2 K(σ K(σ, τ)

May 29, 2008

University of Rome

24

F(σ F(σ)

≤ 2 K(σ K(σ)

F(σ σ(i) – i| F(σ) = ∑i |σ = ∑i | ∑j [σ(i) > σ(j)] – [i > j] |

≤ ∑i ∑j |[[σ(i) > σ(j)] – [i > j] | = ∑i, j [σ(i) > σ(j), i < j] = 2 K(σ K(σ)

May 29, 2008

University of Rome

25

K(σ K(σ) 

[i: j] = inversion i < j, σ(i) > σ(j) 





≤ F(σ F(σ)

Type 1 inversion if σ(i) ≥ j ⇒ i < j ≤ σ(i) ⇒ ∀ i, #{ # j | [i; j] is type 1 inversion } ≤ σ(i) – i Type 2 inversion if σ(i) ≤ j ⇒ σ(j) < σ(i) ≤ j ⇒ ∀ j, #{ # i | [i; j] is type 2 inversion } ≤ j – σ(j σ(j)

Each inversion is type 1, or type 2, or both

K(σ K(σ)

May 29, 2008

≤ type 1 inversions + type 2 inversions ≤ ∑i | σ(i)> i (σ(i) – i) + ∑j | j > σ(j) (j – σ(j)) ≤ F(σ F(σ) University of Rome

26

Optimal aggregation Given metric d(⋅ d(⋅,⋅) and input permutations σ1, …,, σk, find permutation π∗ such that ∑i = 1, k d(σ d(σi, π∗) is minimized Kemeny (Kendall) optimal aggregation: d = K Spearman footrule optimal aggregation: d = F

May 29, 2008

University of Rome

27

Kemeny optimal aggregation Theorem [Bartholdi Tovey Trick 1989]: Kemeny optimal aggregation is NPNP-hard Theorem: Kemeny optimal aggregation is NPNPhard even for 4 lists 

May 29, 2008

Reduction using feedback arc set

University of Rome

28

c-approximate aggregation Given metric d(⋅ d(⋅,⋅) and input permutations σ1, …,, σk, find permutation π such that ∑i = 1, k d(σ d(σi, π) ≤ c ⋅ ∑i = 1, k d(σ d(σi, π∗)

May 29, 2008

University of Rome

29

Trivial approximation Theorem: 2(1 – 1/k)1/k)-approximation can be computed easily Proof: K, F are metrics and simple geometry π* = Optimal aggregation wrt. wrt. d(⋅ d(⋅,⋅) i* = arg mini ∑j d(σ d(σi, σj) ∑j d(σ d(σj, σi*) ≤ (1/k) ∑j, j’j d(σ d(σj, σj’) ≤ (1/k) ∑j, j’j (d(σ d(σj, π*) + d(π d(π*, σj’)) ≤ 2 ∑j d(σ d(σj, π*)

May 29, 2008

University of Rome

30

Footrule optimal aggregation Theorem [DKNS]: FF-optimal aggregation can be computed in polynomial time Proof: Via minimum cost perfect matching Elements

1

1

Positions

a p n

May 29, 2008

n University of Rome

∑i = 1, k |σ σi(a) (a) – p| 31

2-approximation to KK-optimum Use DiaconisDiaconis-Graham inequality π = Footrule optimal aggregation π* = KendallKendall-optimal aggregation ∑i K(σ K(σi, π) ≤ ∑i F(σ F(σi, π) ≤ ∑i F(σ F(σi, π*) ≤ 2 ∑i K(σ K(σi, π*)

May 29, 2008

University of Rome

32

Heuristic: Median rank aggregation Given σ1, …,, σk, µ’(i) (i) = median (σ (σ1(i), …,, σk(i)) (i)) Order µ’ to obtain a permutation µ Eg, Eg, A B C D

B D A C

C D B A

µ’(A) (A) = 3, µ’(B) (B) = 2, µ’(C) (C) = 3, µ’(D) (D) = 2 µ=B D A C Median ranking is used in Olympic figure skating May 29, 2008

University of Rome

33

Median rank aggregation Theorem [DKNS]: If the median ranks of the candidates are unique (ie (ie, ie, form a permutation), then this permutation is a footrule optimal aggregation

What about using the median itself for ranking, even if it is not unique?

May 29, 2008

University of Rome

34

Median is a good approximation Theorem [FKMSV]: Median rank aggregation is a 3-approximation to footrule optimal aggregation

May 29, 2008

University of Rome

35

Consistent permutations Given σ’ = σ’1, …,, σ’n where σ’i ∈ R, call a permutation σ ∈ Sn to be consistent with σ’ if σ’i < σ’j ⇒ σ(i) < σ(j) Consistency lemma: If σ is consistent with σ’,, then for any other permutation τ, F(σ F(σ, σ’)) ≤ F(τ F(τ, σ’))

May 29, 2008

University of Rome

36

Proof of consistency lemma Fact: a’ a ≤ b’ and a < b ⇒ |aa – a’| + |b b – b’| ≤ |aa – b’| + |aa’ – b| If τ ≠ σ, apply this fact repeatedly to differing pairs until τ becomes σ Each time F(τ F(τ, σ’)) can only improve

May 29, 2008

University of Rome

37

Median lemma Fact: Given x1, …,, xn where xi ∈ R, median(x1, …,, xn) = arg miny ∑i |xxi – y| Median lemma: Given permutations σ1, …,, σk, let µ’ denote their median function. Then, for any permutation τ, ∑i F(µ F(µ’,, σi) ≤ ∑i F(τ F(τ, σi)

May 29, 2008

University of Rome

38

Proof of median theorem Let τ be any permutation ∑i F(µ F(µ, σi) ≤ ∑i F(µ F(µ, µ’)) + ∑i F(µ F(µ’,, σi) (triangle) ≤ ∑i F(τ F(τ, µ’)) + ∑i F(µ F(µ’,, σi) (consistency) ≤ ∑i F(τ F(τ, σi)+ 2 ∑i F(µ F(µ’,, σi) (triangle) ≤ ∑i F(τ F(τ, σi)+ 2 ∑i F(τ F(τ, σi) (median) = 3 ∑i F(τ F(τ, σi)

May 29, 2008

University of Rome

39

Merits of median  



May 29, 2008

Simple to implement Admits instance optimal algorithms [FLN]: among all algorithms that do sequential and random access to prepre-sorted preference orders, the runrun-time of this medianmedian-finding algorithm is optimal up to a factor of 2 on every instance A good method for nearestnearest-neighbor applications University of Rome

40

Borda rank aggregation Given σ1, …,, σk, β’(i (i) (i) (i) = σ1(i) + L + σk(i) Order β’ to obtain a permutation β Eg, Eg, A B C D

B D A C

C D B A

β’(A (A) (B) (C) (D) (A) = 8, β’(B (B) = 6, β’(C (C) = 8, β’(D (D) = 8 β=B A C D

May 29, 2008

University of Rome

41

Borda is a good approximation Theorem [FKMSV]: Borda rank aggregation is a 5-approximation to footrule optimal aggregation Borda lemma: ∑i F(β F(β’,, σi) ≤ 2 ∑i F(µ F(µ’,, σi) Prove this pointpoint-wise for every j in the domain

May 29, 2008

University of Rome

42

Proof of Borda lemma ∑i |β β’(j) (j) - σi(j) (j)| = ∑i |(1/k (1/k ∑i’ σi’(j)) (j)) – σi(j) (j) | = (1/k) ∑i |∑ ∑i’(σi’(j) (j) – σi(j)) (j))| )) ≤ (1/k) ∑i, i’i | σi’(j) (j) – σi(j) (j) | ≤ (1/k) ∑i, i’i (| σi’(j) (j) – µ(j) (j) | + | σi(j) (j) – µ(j) (j) |)) = 2 ∑i | σi(j) (j) – µ(j) |

May 29, 2008

University of Rome

43

Proof of Borda theorem Let τ be any permutation ∑i F(β F(β, σi) ≤ ∑i F(β F(β, β’)) + ∑i F(β F(β’,, σi) (triangle) ≤ ∑i F(τ F(τ, β’)) + ∑i F(β F(β’,, σi) (consistency) ≤ ∑i F(τ F(τ, σi)+ 2 ∑i F(β F(β’,, σi) (triangle) ≤ ∑i F(τ F(τ, σi)+ 2 ∑i F(µ F(µ’,, σi) (Borda) ≤ ∑i F(τ F(τ, σi)+ 4 ∑i F(τ F(τ, σi) (median) = 5 ∑i F(τ F(τ, σi)

May 29, 2008

University of Rome

44

Copeland rank aggregation Given σ1, …,, σk, Γ(i, (i, j) = majority { σ1(i) vs σ1(j), …,, σk(i) (i) vs. σk(j) (j) } γ’(i) (i) = ∑i Γ(i, j) – ∑j Γ(j, i) B Order γ’ to obtain a permutation γ Eg, Eg, A B C D

B D A C

C D B A

A

C

γ’(A (A) (B) (C) (D) (A) = -1, γ’(B (B) = 3, γ’(C (C) = -1, γ’(D (D) = -1 γ=B A C D D

May 29, 2008

University of Rome

45

Copeland is a good approximation Theorem [FKMSV]: Copeland rank aggregation is a 66-approximation to Kendall optimal aggregation Proof: As before, but using K instead of F

May 29, 2008

University of Rome

46

Plurality method Given σ1, …,, σk, π’(i (i) (i) = 〈 …,, # j-th place votes, … 〉 Lexicographically order π’ to obtain a permutation π Eg, Eg, A B C D

B D A C

C D B A

π’(A (A) (B) (A) = 〈 1 0 1 1 〉, π’(B (B) = 〈 1 1 1 0 〉, π’(C (C) (D) (C) = 〈 1 0 1 1 〉, π’(D (D) = 〈 0 2 0 1 〉 π=B A C D

May 29, 2008

University of Rome

47

Plurality is not a good approximation Theorem [FKMSV]: Plurality rank aggregation is not a good to approximation to Kendall optimal aggregation Proof: n candidates, k voters, n >> k 1 1 2 3 4 … k-1 2 2 3 2 2… 2 3 3 4 4 3… 3 … n n 1 1 1… 1

May 29, 2008

π=12…n

∑i F(π F(π, σi) ≥ (k(k-2)(n2)(n-1) β = 2 3 … n 1 ∑i F(β F(β, σi) ≤ k3 + n n ↑ ⇒ Ratio = Ω(k)

University of Rome

48

Rank agg vs min feedback arc set  



Min feedback arc set (FAS): Find smallest E’ E ⊆ E such that graph (V, E – E’)) is acyclic



V = candidates (i, j) ∈ E = fraction of voters who rank i above j Acyclic = linear ordering





May 29, 2008

G = (V, E), directed Tournament: each pair (i, j) has an edge

University of Rome

49

New approximation algorithm Theorem [Ailon, Charikar, Newman]: There is a 11/711/7approximation algorithm for rank aggregation Proof: Consists of combining two approximation algorithms 1. Pick input closest to all other inputs (2(2-approx) 2. Construct a tournament and approximate FAS on a weighted tournament (2(2-approx) 3. Take the best of both solutions

May 29, 2008

University of Rome

50

FAS on unweighted tournaments RQS(V, E)  Pick a node u ∈ V at random  V1 = { v | (v, u) in E }, E1 = edges in V1  V2 = { v | (u, v) in E }, E2 = edges in V2  Output RQS(V1, E1) ° u ° RQS(V2, E2) Randomized quicksort! quicksort! Theorem [ACN]: RQS is 33-approximation algorithm May 29, 2008

University of Rome

51

The final word on this … Theorem [Kenyon[Kenyon-Mathieu, Schudy]: Schudy]: There is a PTAS for rank aggregation

May 29, 2008

University of Rome

52

Heuristics: Markov chains  



May 29, 2008

States = candidates Transitions = function of voting preferencess Probabilistically switch to a better candidate Final ranking = order of stationary probabilities

University of Rome

53

Advantages of Markov chains 

 



May 29, 2008

Handling partial lists and top k lists using available information to infer new ones Handling uneven comparisons and list lengths Motivation from PageRank--PageRank---more ---more wins better, more wins against good players even better With O(nk) O(nk) preprocessing, O(k) O(k) per step for about O(n) O(n) steps

University of Rome

54

Sample Markov chains If current state is candidate P, next state is:  MC1: Choose uniformly from the multiset of all candidates that were ranked higher than or equal to P by some voter that ranked P …  MC4: Choose unifomly a candidate Q from all candidates and switch if the majority preferred Q to P

May 29, 2008

University of Rome

55

Metasearch results 



May 29, 2008

Using top 100 from major search engines Queries: affirmative action, alcoholism, …

University of Rome

K

F

Borda

0.214

0.345

Footrule

0.111

0.167

MC1

0.130

0.213

MC2

0.128

0.210

MC3

0.114

0.183

MC4

0.104

0.149 56

Other approaches to Metasearch  

Support vector machines [Joachims 2002] Learning [Cohen Schapire Singer 1999] 



Condorcet fusion 



May 29, 2008

Hedge algorithm—iterative algorithm iterative weight update [Montague Aslam 2002]

Finding Hamiltonian paths in Condorcet graphs

Bayesian

[Aslam Montague 2001]

University of Rome

57

Equivalent distant measures Distance measure = nonnon-negative, symmetric, regular binary function Two distance measures d(⋅ d ⋅, ⋅) and D(⋅ D(⋅, ⋅) are equivalent if there is a constant b > 0 such that forall x, y in domain, D(x, D(x, y) ≤ d(x, d(x, y) ≤ b ⋅ D(x, D(x, y) Theorem [FKS]: If d is a metric, then D satisfies approximate triangle inequality Theorem [FKS]: If there is a factor cc-aggregation algorithm with respect to d, then there is a factor (cb (cb) cb)-aggregation algorithm with respect to D

May 29, 2008

University of Rome

58

Comparing top k lists in IR Define distance measures to compare top k lists  Useful in ranking search engine results, …  τ1, τ2 = top k lists, D = dom τ1 U dom τ2 dmin(τ1, τ2) = minσ ≥ τ , σ ≥ τ , σ , τ ∈ S|DD| { d(σ σ1, σ2) } 1 1 2 2 1 1 davg(τ1, τ2) = avg … dhaus(τ1, τ2) = max { maxσ minσ … } 1 2 Theorem [FKS]: The distance measures Kmin, Kavg, Khaus, Fmin, Favg, Fhaus, are all in the same equivalence class 

May 29, 2008

University of Rome

59

Comparing bucket orders Bucket order = Linear order with “ties ties” ties ProfileProfile-based measures K-profile(σ profile(σ): pi, j = 1 if σ(i) < σ(j), 0 if σ(i) = σ(j), -1 o.w. o.w. F-profile(σ profile(σ): pi = average position of i in its bucket dprof = L1 distance between dd-profiles Hausdorff measures as before Theorem [FKMSV]: The distance measures Kprof, Khaus, Fprof, Fhaus can be computed in polynomial time Theorem [FKMSV]: The distance measures Kprof, Khaus, Fprof, Fhaus, are all in the same equivalence class

May 29, 2008

University of Rome

60

Some open questions   

Improve the constants for median, Borda Is rank aggregation NPNP-hard for three lists? Aggregating wrt other metrics on permutations 



May 29, 2008

Borda is nearnear-optimal wrt Spearman’s Spearman s rho

Practical algorithms for PTAS

University of Rome

61

Some references     

May 29, 2008

Dwork, Kumar, Naor, Sivakumar. Rank aggregation methods for the web, WWW, 2001. Fagin, Kumar, Sivakumar. Comparing toptop-k lists, SODA, 2003. Fagin, Kumar, Mahdian, Sivakumar, Vee. Comparing partial rankings, PODS, 2004. Ailon, Charikar, Newman. Aggregating incosistent information: Ranking and clustering, STOC, 2005. KenyonKenyon-Mathieu, Schudy. Schudy. How to rank with few errors: A PTAS for weighted feedback arc set on tournaments, STOC, 2007. University of Rome

62

Thank you all!

ravikumar@[email protected]

May 29, 2008

University of Rome

63