Distributed MAP Inference for Undirected Graphical Models

3 downloads 0 Views 2MB Size Report
Graphical models are used in a number of information extraction tasks. • Recently, models are getting larger and denser. • Coreference Resolution [Culotta et al.
Distributed MAP Inference for Undirected Graphical Models Sameer Singh1 Amarnag Subramanya2 Fernando Pereira2 Andrew McCallum1 1 University 2 Google

of Massachusetts, Amherst MA

Research, Mountain View CA

Workshop on Learning on Cores, Clusters and Clouds (LCCC) Neural Information Processing Systems (NIPS) 2010

Motivation

• Graphical models are used in a number of information extraction tasks • Recently, models are getting larger and denser • Coreference Resolution [Culotta et al. NAACL 2007] • Relation Extraction [Riedel et al. EMNLP 2010, Poon & Domingos EMNLP 2009] • Joint Inference [Finkel & Manning. NAACL 2009, Singh et al. ECML 2009] • Inference is difficult, and approximations have been proposed • LP-Relaxations [Martins et al. EMNLP 2010] • Dual Decomposition [Rush et al. EMNLP 2010] • MCMC-Based [McCallum et al. NIPS 2009, Poon et al. AAAI 2008]

Motivation

• Graphical models are used in a number of information extraction tasks • Recently, models are getting larger and denser • Coreference Resolution [Culotta et al. NAACL 2007] • Relation Extraction [Riedel et al. EMNLP 2010, Poon & Domingos EMNLP 2009] • Joint Inference [Finkel & Manning. NAACL 2009, Singh et al. ECML 2009] • Inference is difficult, and approximations have been proposed • LP-Relaxations [Martins et al. EMNLP 2010] • Dual Decomposition [Rush et al. EMNLP 2010] • MCMC-Based [McCallum et al. NIPS 2009, Poon et al. AAAI 2008]

Without parallelization, these approaches have restricted scalability

Motivation

Contributions: 1 Distribute MAP Inference for a large, dense factor graph • 1 million variables, 250 machines 2

Incorporate sharding as variables in the model

Outline 1 Model and Inference

Graphical Models MAP Inference Distributed Inference 2 Cross-Document Coreference

Coreference Problem Pairwise Model Inference and Distribution 3 Hierarchical Models

Sub-Entities Super-Entities 4 Large-Scale Experiments

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Factor Graphs

Represent distribution over variables Y using factors ψ. X p(Y = y ) ∝ exp ψc (yc ) yc ⊆y

Note: Set of factors is different of every assignment Y = y ({ψ}y )

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

1 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Factor Graphs

Represent distribution over variables Y using factors ψ. X p(Y = y ) ∝ exp ψc (yc ) yc ⊆y

Note: Set of factors is different of every assignment Y = y ({ψ}y )

0

1

1

0

Y1

Y2

Y3

Y4

{ψ}0110 =

01 11 10 00 {ψ12 , ψ23 , ψ34 , ψ14 }

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

1 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Factor Graphs

Represent distribution over variables Y using factors ψ. X p(Y = y ) ∝ exp ψc (yc ) yc ⊆y

Note: Set of factors is different of every assignment Y = y ({ψ}y )

0

1

1

0

0

1

1

1

Y1

Y2

Y3

Y4

Y1

Y2

Y3

Y4

01 11 10 00 {ψ}0110 = {ψ12 , ψ23 , ψ34 , ψ14 }

Sameer Singh (UMass, Amherst)

01 11 11 11 {ψ}0111 = {ψ12 , ψ23 , ψ34 , ψ24 }

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

1 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

MAP1 Inference

We want to find the best configuration according to the model, yˆ = arg max p(Y = y ) y

= arg max exp y

1

X

ψc (yc )

yc ⊆y

MAP = maximum a posteriori

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

2 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

MAP1 Inference

We want to find the best configuration according to the model, yˆ = arg max p(Y = y ) y

= arg max exp y

X

ψc (yc )

yc ⊆y

Computational bottlenecks: 1 2

Space of Y is usually enormous (exponential) X Even evaluating ψc (yc ) for each y may be polynomial yc ⊆y

1

MAP = maximum a posteriori

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

2 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

MCMC for MAP Inference Initial Configuration y = y0 for (num samples): 1

2

Propose a change to y to get configuration y 0 (Usually a small change)   1/t  p(y 0 ) 0 Acceptance probability: α(y , y ) = min 1, p(y ) (Only involve computations local to the change)

3

if Toss(α):

Accept the change, y = y 0

return y

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

3 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

MCMC for MAP Inference Initial Configuration y = y0 for (num samples): 1

2

Propose a change to y to get configuration y 0 (Usually a small change)   1/t  p(y 0 ) 0 Acceptance probability: α(y , y ) = min 1, p(y ) (Only involve computations local to the change)

3

Accept the change, y = y 0

if Toss(α):

return y p(y 0 ) p(y )

= exp

Sameer Singh (UMass, Amherst)

 X 

yc0 ⊆y 0

ψc (yc0 ) −

X yc ⊆y

Distributed MAP Inference

ψc (yc )

  

LCCC, NIPS 2010 Workshop

3 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Mutually Exclusive Proposals 0

Let {ψ}yy be the set of factors used to evaluate a proposal y → y 0   0 i.e. {ψ}yy = {ψ}y ∪ {ψ}y 0 − {ψ}y ∩ {ψ}y 0 Consider two proposals y → ya and y → yb such that, {ψ}yya ∩ {ψ}yyb = {} Completely different set of factors are required to evaluate these proposals.

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

4 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Mutually Exclusive Proposals 0

Let {ψ}yy be the set of factors used to evaluate a proposal y → y 0   0 i.e. {ψ}yy = {ψ}y ∪ {ψ}y 0 − {ψ}y ∩ {ψ}y 0 Consider two proposals y → ya and y → yb such that, {ψ}yya ∩ {ψ}yyb = {} Completely different set of factors are required to evaluate these proposals. These two proposals can be evaluated (and accepted) in parallel.

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

4 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Distributor

Distributed Inference

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

5 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Distributor

Distributed Inference

Sameer Singh (UMass, Amherst)

Inference Inference Inference

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

5 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Distributor

Distributed Inference

Sameer Singh (UMass, Amherst)

Inference

Combine

Inference Inference

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

5 / 19

Outline 1 Model and Inference

Graphical Models MAP Inference Distributed Inference 2 Cross-Document Coreference

Coreference Problem Pairwise Model Inference and Distribution 3 Hierarchical Models

Sub-Entities Super-Entities 4 Large-Scale Experiments

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Coreference Problem

... The Physiological Basis of Politics,” by Kevin B. Smith, Douglas Oxley, Matthew Hibbing... ...during the late 60's and early 70's, Kevin Smith worked with several local... ...the term hip-hop is attributed to Lovebug Starski. What does it actually mean... The filmmaker Kevin Smith returns to the role of Silent Bob... Nothing could be more irrelevant to Kevin Smith's audacious ''Dogma'' than ticking off... Firefighter Kevin Smith spent almost 20 years preparing for Sept. 11. When he... Like Back in 2008, the Lions drafted Kevin Smith, even though Smith was badly... ...shorthanded backfield in the wake of Kevin Smith's knee injury, and the addition of Haynesworth... ...were coming,'' said Dallas cornerback Kevin Smith. ''We just didn't know when... BEIJING, Feb. 21— Kevin Smith, who played the god of war in the "Xena"...

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

6 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Coreference Problem

... The Physiological Basis of Politics,” by Kevin B. Smith, Douglas Oxley, Matthew Hibbing... ...during the late 60's and early 70's, Kevin Smith worked with several local...

Set 1 Set 2

...the term hip-hop is attributed to Lovebug Starski. What does it actually mean... The filmmaker Kevin Smith returns to the role of Silent Bob...

Set 3

Nothing could be more irrelevant to Kevin Smith's audacious ''Dogma'' than ticking off... Set 4

Firefighter Kevin Smith spent almost 20 years preparing for Sept. 11. When he... Like Back in 2008, the Lions drafted Kevin Smith, even though Smith was badly...

Set 5

...shorthanded backfield in the wake of Kevin Smith's knee injury, and the addition of Haynesworth... ...were coming,'' said Dallas cornerback Kevin Smith. ''We just didn't know when...

Set 6

BEIJING, Feb. 21— Kevin Smith, who played the god of war in the "Xena"... Set 7

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

6 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Coreference Problem

... The Physiological Basis of Politics,” by Kevin B. Smith, Douglas Oxley, Matthew Hibbing...

Author

...during the late 60's and early 70's, Kevin Smith worked with several local... Rapper

...the term hip-hop is attributed to Lovebug Starski. What does it actually mean... The filmmaker Kevin Smith returns to the role of Silent Bob...

Filmmaker

Nothing could be more irrelevant to Kevin Smith's audacious ''Dogma'' than ticking off... Firefighter

Firefighter Kevin Smith spent almost 20 years preparing for Sept. 11. When he... Like Back in 2008, the Lions drafted Kevin Smith, even though Smith was badly...

Running back

...shorthanded backfield in the wake of Kevin Smith's knee injury, and the addition of Haynesworth... ...were coming,'' said Dallas cornerback Kevin Smith. ''We just didn't know when...

Cornerback

BEIJING, Feb. 21— Kevin Smith, who played the god of war in the "Xena"... Actor

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

6 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Input Features m1 m3

Define similarity between mentions, φ : M2 → R m2

m4

• φ(mi , mj ) > 0: mi , mj are similar • φ(mi , mj ) < 0: mi , mj are dissimilar

m5

We use cosine similarity of the context bag of words: φ(mi , mj ) = cosSim({c}i , {c}j ) − b

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

7 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Graphical Model The random variables in our model are entities (E ) and mentions (M)

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

8 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Graphical Model The random variables in our model are entities (E ) and mentions (M) For any assignment to these entities (E = e), we define the model score:    X  X p(E = e) ∝ exp ψa (mi , mj ) + ψr (mi , mj )   mi ∼mj

mi mj

where ψa (mi , mj ) = wa φ(mi , mj ), and ψr (mi , mj ) = −wr φ(mi , mj )

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

8 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Graphical Model The random variables in our model are entities (E ) and mentions (M) For any assignment to these entities (E = e), we define the model score:    X  X p(E = e) ∝ exp ψa (mi , mj ) + ψr (mi , mj )   mi ∼mj

mi mj

where ψa (mi , mj ) = wa φ(mi , mj ), and ψr (mi , mj ) = −wr φ(mi , mj ) For the following configuration,

m4 e2

p(e1 , e2 ) ∝ exp

m1 m5 e1



wa (φ12 + φ13 + φ23 + φ45 )

− wr (φ15 + φ25 + φ35 +φ14 + φ24 + φ34 )

m2

m3

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

8 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Graphical Model The random variables in our model are entities (E ) and mentions (M) For any assignment to these entities (E = e), we define the model score:    X  X p(E = e) ∝ exp ψa (mi , mj ) + ψr (mi , mj )   mi ∼mj

mi mj

where ψa (mi , mj ) = wa φ(mi , mj ), and ψr (mi , mj ) = −wr φ(mi , mj ) For the following configuration,

m4 e2

p(e1 , e2 ) ∝ exp

m1 m5 e1



wa (φ12 + φ13 + φ23 + φ45 )

− wr (φ15 + φ25 + φ35 +φ14 + φ24 + φ34 )

m2

m3

1 2

Space of E is Bell Number(n) in number of mentions Evaluating model score for each E = e is O(n2 )

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

8 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

MCMC for MAP Inference m4 m4

e2

e2

m1 m1

m5 e1

m5

e1

m2 m2

m3

m3

p(e) ∝ exp{wa (φ12 + φ13 + φ23 + φ45 )

p(´ e ) ∝ exp{wa (φ12 + φ34 + φ35 + φ45 )

−wr (φ15 + φ25 + φ35 + φ14 + φ24 + φ34 )}

−wr (φ15 + φ25 + φ13 + φ14 + φ24 + φ23 )

log

p(´ e) p(e)

= wa (φ34 + φ35 − φ13 − φ23 ) − wr (φ13 + φ23 − φ34 − φ35 )

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

9 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Mutually Exclusive Proposals m4 e2 m1 m5

e1 m4 m2

e2 m1 m5 e1

m3

e3

m2

m3

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

10 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Mutually Exclusive Proposals

m4 e2 m1 m5 e1 e2 m4

m2 m1

m3

e3 m5 e1

m2

m3

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

10 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Mutually Exclusive Proposals m4 e2 m1 m5

e1 m4 m2

e2 m1

m3

m5 e1

e3 e2 m4

m2 m1

m3

e3 m5 e1

m2

m3

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

10 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Results Accuracy versus Time

0.30 0.25

Accuracy

0.20 0.15 0.10 0.05 0.000 Sameer Singh (UMass, Amherst)

B3 F1

Pairwise F1 1

2 3 Wallclock Running Time (ms) Distributed MAP Inference

1 4

5 1e7

LCCC, NIPS 2010 Workshop

11 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Results Accuracy versus Time

0.40 0.35 0.30

Accuracy

0.25 0.20 0.15 0.10 0.05 0.000 Sameer Singh (UMass, Amherst)

B3 F1

1 2

Pairwise F1 1

2 3 Wallclock Running Time (ms) Distributed MAP Inference

4

5 1e7

LCCC, NIPS 2010 Workshop

11 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Results Accuracy versus Time

0.5

Accuracy

0.4 0.3 0.2 0.1

1 2 5

B3 F1 0.00 Sameer Singh (UMass, Amherst)

Pairwise F1 1

2 3 Wallclock Running Time (ms) Distributed MAP Inference

4

5 1e7

LCCC, NIPS 2010 Workshop

11 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Results Accuracy versus Time

0.5

Accuracy

0.4 0.3 0.2

1 2 5 10

0.1

B3 F1 0.00 Sameer Singh (UMass, Amherst)

Pairwise F1 1

2 3 Wallclock Running Time (ms) Distributed MAP Inference

4

5 1e7

LCCC, NIPS 2010 Workshop

11 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Results Accuracy versus Time

0.6 0.5

Accuracy

0.4 0.3 0.2 0.1 0.00 Sameer Singh (UMass, Amherst)

1 2 5 10 50

B3 F1

Pairwise F1 1

2 3 Wallclock Running Time (ms) Distributed MAP Inference

4

5 1e7

LCCC, NIPS 2010 Workshop

11 / 19

Outline 1 Model and Inference

Graphical Models MAP Inference Distributed Inference 2 Cross-Document Coreference

Coreference Problem Pairwise Model Inference and Distribution 3 Hierarchical Models

Sub-Entities Super-Entities 4 Large-Scale Experiments

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Sub-Entities

• Consider an accepted move for

a mention

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

12 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Sub-Entities

• Ideally, similar mentions should

also move to the same entity • Default proposal function does

not utilize this • Good proposals become more

rare with larger datasets

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

12 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Sub-Entities

• Include Sub-Entity variables • Model score is used to sample

sub-entity variables • Propose moves of mentions in a

sub-entity simultaneously

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

12 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Super-Entities

• Random distribution may not Random Distribution

assign similar entities to the same machine • Probability that similar entities

will be assigned to the same machine is small

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

13 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Super-Entities

• Augment model with

Super-Entities variables Model-Based Distribution

• Entities in the same super-entity

are assigned the same machine • Model score is used to sample

super-entity variables

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

13 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Hierarchical Representation

Entities

Sub-Entities

Super Entities

• Factors sub-entities mentions entities sub-entities in the same entities super-entities • Repulsion factors are similarly symmetric across levels • Affinity factors between

Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

14 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Hierarchical Representation

Entities

Sub-Entities

Super Entities

• Factors sub-entities mentions entities sub-entities in the same entities super-entities • Repulsion factors are similarly symmetric across levels • Affinity factors between

• Sampling: Fix variables of two levels, sample the remaining level Sameer Singh (UMass, Amherst)

Distributed MAP Inference

LCCC, NIPS 2010 Workshop

14 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Evaluation Accuracy versus Time

0.6 0.5

Accuracy

0.4 0.3 0.2 0.1 0.00.0

B3 F1

Pairwise F1 0.5

Sameer Singh (UMass, Amherst)

1.5 1.0 2.0 Wallclock Running Time (ms) Distributed MAP Inference

pairwise 2.5

3.0 1e7

LCCC, NIPS 2010 Workshop

15 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Evaluation Accuracy versus Time

0.7 0.6

Accuracy

0.5 0.4 0.3 0.2 0.1 0.00.0 Sameer Singh (UMass, Amherst)

B3 F1

Pairwise F1 0.5

1.5 1.0 2.0 Wallclock Running Time (ms) Distributed MAP Inference

pairwise super-entities 2.5

3.0 1e7

LCCC, NIPS 2010 Workshop

15 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Evaluation Accuracy versus Time

0.8 0.7 0.6

Accuracy

0.5 0.4 0.3 0.2 0.1 0.00.0 Sameer Singh (UMass, Amherst)

B3 F1

Pairwise F1 0.5

1.5 1.0 2.0 Wallclock Running Time (ms) Distributed MAP Inference

pairwise super-entities sub-entities 2.5

3.0 1e7

LCCC, NIPS 2010 Workshop

15 / 19

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Evaluation Accuracy versus Time

0.8 0.7 0.6

Accuracy

0.5 0.4 0.3 0.2 0.1 0.00.0 Sameer Singh (UMass, Amherst)

B3 F1

Pairwise F1 0.5

1.5 1.0 2.0 Wallclock Running Time (ms) Distributed MAP Inference

pairwise super-entities sub-entities combined 2.5

3.0 1e7

LCCC, NIPS 2010 Workshop

15 / 19

Outline 1 Model and Inference

Graphical Models MAP Inference Distributed Inference 2 Cross-Document Coreference

Coreference Problem Pairwise Model Inference and Distribution 3 Hierarchical Models

Sub-Entities Super-Entities 4 Large-Scale Experiments

Model and Inference

Coreference

Hierarchical Models

Large-Scale Experiments

Related Work

Conclusions

Preliminary Large-Scale Experiments

Data • New York Times Annotated Corpus

[Sandhous LDC 2008]

20 years of articles (1987-2007) • prune rare names (