of Gologras and Gawain, Territories in The Pendragon Adventure, A Storm of
Swords, Connor MacLeod, Paul Atreides her manner ___, being ___ by their,
___ ...
Cross-Cutting Models of Lexical Semantics Joseph Reisinger and Raymond Mooney
Distributional Lexical Semantics
• Represent “meaning” as a point/vector in a high-dimensional space • Word relatedness correlates with some distance metric Almuhareb and Poesio (2004), Baroni and Lenci (2009), Bullinaria and Levy (2007), Erk (2007), Griffiths et al. (2007), Landauer and Dumais (1997), Moldovan (2006), Padó and Lapata (2007), Pantel and Pennacchiotti (2006), Sahlgren (2006), Turney and Pantel (2010)
bat
⌦= disco 2
Distributional Lexical Semantics
• Represent “meaning” as a point/vector in a high-dimensional space • Word relatedness correlates with some distance metric Almuhareb and Poesio (2004), Baroni and Lenci (2009), Bullinaria and Levy (2007), Erk (2007), Griffiths et al. (2007), Landauer and Dumais (1997), Moldovan (2006), Padó and Lapata (2007), Pantel and Pennacchiotti (2006), Sahlgren (2006), Turney and Pantel (2010)
bat
⌦=
d disco 2
Distributional Lexical Semantics
bat
disco
bat club disco
“meaning violates the triangle inequality” Tversky and Gati (1982), Griffiths et al. (2007)
3
Distributional Lexical Semantics
bat
bat
disco
club1
club2
disco
“meaning violates the triangle inequality” Tversky and Gati (1982), Griffiths et al. (2007)
• Address metric violations by learning word sense clusters / making use of local context
• Can we build a model that captures this directly? 4
Cross-cutting Concept Organization breakfast food
french food
healthy
snack
chinese food
unhealthy
dinner food
indian food
•
Human concept organization exhibits cross cutting structure Rosch, et al. (1976); Ross & Murphy (1999); Medin, et al. (2005); Shaftoe, et al. (2011)
•
Each categorization system controls what kinds of generalizations (e.g. inferences) are valid.
•
Do word usages exhibit similar cross-cutting?
•
Xue, Chen and Palmer (2006): sense disambiguation requires vastly different features for different polysemous verbs in Chinese. 5
Multi View Multinomial Clustering
• There are many valid word clusterings, each capturing different aspects of syntax or topicality
• We introduce a model to explicitly capture
and is ___ we are ___ he is ___ unwilling willing reluctant refusing glad
and are ___ which was ___ who are ___
exceedingly sincerely logically justly appropriately
about because
multiple organizational systems
• Cross-cutting categorization / latent
subspaces with separate, coherent clusterings
• Implement using LDA and DPMM primitives / Gibbs sampling
brand new ___ selection of ___ ___ for sale samsung panasonic toshiba sony epson
results for ___ the latest ___ to buy ___
toyota nissan mercedes volvo audi
dunlop yokohama toyo uniroyal michelin
Figure 1: Example clusterings from MVM applied to Google n-gram data. Top contexts (features) for each view are shown, along with examples of word clusters. Although these particular examples are interpretable, in
Multi View Multinomial Clustering Model View 1 Cluster 1
Cluster 2
View 2 Cluster 3
Cluster 1
View 3 Cluster 2
Cluster 1
Cluster 2
3
4
Data Austin
History of Austin, Texas, University of Texas Medical Branch, 1993 Pacific hurricane season, Rutherford B. Hayes, List of pipeline accidents, List of Austin City Limits performers, Texas in the American Civil War, 6th Cavalry Regiment (United States) ___ texas homes, ___ law school, the citizens of ___, the ___ business directory, ___ police department, university in ___, ___ vacation rentals, the ___ parks and, by the ___ business journal, coming to ___, the ___ area, deals on ___ hotels
Betrayed
Survivor: The Amazon, Personal life of Marcus Tullius Cicero, Numb3rs, Huns, Rurouni Kenshin, Liberation of Paris, The Knightly Tale of Gologras and Gawain, Territories in The Pendragon Adventure, A Storm of Swords, Connor MacLeod, Paul Atreides her manner ___, being ___ by their, ___ and murdered, ___ his weakness, she ___ him, ___ the secret, ___ by her husband, a voice that ___, who felt ___, ___ to the police, ___ their country, suspected of having ___, ___ the confidence, even when ___
Cat
South China Tiger, Hybrid (biology), List of mammals of Cameroon, Cantonese cuisine, Pound Puppies, Wonder Pets, The Wizard of Oz (1902 stage play), Mee-Ow, Animal rights, Rickrolling, Mera (comics), Taboo food and drink, Tuna, Garfield: The Movie ate the ___, have a ___ and a, the ___ and the mouse, the ___ who killed, ___ toys by, ___ in the city, ___ was diagnosed, crazy ___ lady, ___ of the month, protect your ___ from, new ___ food, and bought a ___, ___ or other animal, a sick ___,
5
View 1 Cluster 1
Cluster 2
View 2 Cluster 3
Cluster 1
View 3 Cluster 2
Cluster 1
Cluster 2
3
4
5
View 1 Cluster 1
Cat
Cluster 2
View 2 Cluster 3
Cluster 1
View 3 Cluster 2
Cluster 1
Cluster 2
3
4
South China Tiger, Hybrid (biology), List of mammals of Cameroon, Cantonese cuisine, Pound Puppies, Wonder Pets, The Wizard of Oz (1902 stage play), Mee-Ow, Animal rights, Rickrolling, Mera (comics), Taboo food and drink, Tuna, Garfield: The Movie ate the ___, have a ___ and a, the ___ and the mouse, the ___ who killed, ___ toys by, ___ in the city, ___ was diagnosed, crazy ___ lady, ___ of the month, protect your ___ from, new ___ food, and bought a ___, ___ or other animal, a sick ___,
5
View 1 Cluster 1
Cluster 2
View 2 Cluster 3
Cluster 1
c1,d
Cat
View 3 Cluster 2
Cluster 1
Cluster 2
3
c2,d
4
c3,d
South China Tiger, Hybrid (biology), List of mammals of Cameroon, Cantonese cuisine, Pound Puppies, Wonder Pets, The Wizard of Oz (1902 stage play), Mee-Ow, Animal rights, Rickrolling, Mera (comics), Taboo food and drink, Tuna, Garfield: The Movie ate the ___, have a ___ and a, the ___ and the mouse, the ___ who killed, ___ toys by, ___ in the city, ___ was diagnosed, crazy ___ lady, ___ of the month, protect your ___ from, new ___ food, and bought a ___, ___ or other animal, a sick ___,
• Select a cluster assignment c
for d in each view v (DPMM) i.e. words are assigned to clusters within each view v,d
5
View 1 Cluster 1
Cluster 2
View 2 Cluster 3
Cluster 1
c1,d
Cat
View 3 Cluster 2
Cluster 1
Cluster 2
3
c2,d
4
c3,d
South China Tiger, Hybrid (biology), List of mammals of Cameroon, Cantonese cuisine, Pound Puppies, Wonder Pets, The Wizard of Oz (1902 stage play), Mee-Ow, Animal rights, Rickrolling, Mera (comics), Taboo food and drink, Tuna, Garfield: The Movie ate the ___, have a ___ and a, the ___ and the mouse, the ___ who killed, ___ toys by, ___ in the city, ___ was diagnosed, crazy ___ lady, ___ of the month, protect your ___ from, new ___ food, and bought a ___, ___ or other animal, a sick ___,
• Select a cluster assignment c
for d in each view v (DPMM) i.e. words are assigned to clusters within each view v,d
• Select a view v for each observed feature, and generate it from c f
features distributed between views
vf,d
(LDA) i.e.
5
View 1 Cluster 1
Cluster 2
View 2 Cluster 3
Cluster 1
View 3 Cluster 2
Cluster 1
Cluster 2
3
4
Cat
South China Tiger, Hybrid (biology), List of mammals of Cameroon, Cantonese cuisine, Pound Puppies, Wonder Pets, The Wizard of Oz (1902 stage play), Mee-Ow, Animal rights, Rickrolling, Mera (comics), Taboo food and drink, Tuna, Garfield: The Movie ate the ___, have a ___ and a, the ___ and the mouse, the ___ who killed, ___ toys by, ___ in the city, ___ was diagnosed, crazy ___ lady, ___ of the month, protect your ___ from, new ___ food, and bought a ___, ___ or other animal, a sick ___,
Betrayed
Survivor: The Amazon, Personal life of Marcus Tullius Cicero, Numb3rs, Huns, Rurouni Kenshin, Liberation of Paris, The Knightly Tale of Gologras and Gawain, Territories in The Pendragon Adventure, A Storm of Swords, Connor MacLeod, Paul Atreides her manner ___, being ___ by their, ___ and murdered, ___ his weakness, she ___ him, ___ the secret, ___ by her husband, a voice that ___, who felt ___, ___ to the police, ___ their country, suspected of having ___, ___ the confidence, even when ___
• Select a cluster assignment c
for d in each view v (DPMM) i.e. words are assigned to clusters within each view v,d
• Select a view v for each observed feature, and generate it from c f
features distributed between views
vf,d
(LDA) i.e.
5
View 1 Cluster 1
Cluster 2
View 2 Cluster 3
Cluster 1
c1,d
c2,d
View 3 Cluster 2
Cluster 1
Cluster 2
3
4
c3,d
Cat
South China Tiger, Hybrid (biology), List of mammals of Cameroon, Cantonese cuisine, Pound Puppies, Wonder Pets, The Wizard of Oz (1902 stage play), Mee-Ow, Animal rights, Rickrolling, Mera (comics), Taboo food and drink, Tuna, Garfield: The Movie ate the ___, have a ___ and a, the ___ and the mouse, the ___ who killed, ___ toys by, ___ in the city, ___ was diagnosed, crazy ___ lady, ___ of the month, protect your ___ from, new ___ food, and bought a ___, ___ or other animal, a sick ___,
Betrayed
Survivor: The Amazon, Personal life of Marcus Tullius Cicero, Numb3rs, Huns, Rurouni Kenshin, Liberation of Paris, The Knightly Tale of Gologras and Gawain, Territories in The Pendragon Adventure, A Storm of Swords, Connor MacLeod, Paul Atreides her manner ___, being ___ by their, ___ and murdered, ___ his weakness, she ___ him, ___ the secret, ___ by her husband, a voice that ___, who felt ___, ___ to the police, ___ their country, suspected of having ___, ___ the confidence, even when ___
• Select a cluster assignment c
for d in each view v (DPMM) i.e. words are assigned to clusters within each view v,d
• Select a view v for each observed feature, and generate it from c f
features distributed between views
vf,d
(LDA) i.e.
5
View 1 Cluster 1
Cluster 2
View 2 Cluster 3
Cluster 1
c1,d
c2,d
View 3 Cluster 2
Cluster 1
Cluster 2
3
4
c3,d
Cat
South China Tiger, Hybrid (biology), List of mammals of Cameroon, Cantonese cuisine, Pound Puppies, Wonder Pets, The Wizard of Oz (1902 stage play), Mee-Ow, Animal rights, Rickrolling, Mera (comics), Taboo food and drink, Tuna, Garfield: The Movie ate the ___, have a ___ and a, the ___ and the mouse, the ___ who killed, ___ toys by, ___ in the city, ___ was diagnosed, crazy ___ lady, ___ of the month, protect your ___ from, new ___ food, and bought a ___, ___ or other animal, a sick ___,
Betrayed
Survivor: The Amazon, Personal life of Marcus Tullius Cicero, Numb3rs, Huns, Rurouni Kenshin, Liberation of Paris, The Knightly Tale of Gologras and Gawain, Territories in The Pendragon Adventure, A Storm of Swords, Connor MacLeod, Paul Atreides her manner ___, being ___ by their, ___ and murdered, ___ his weakness, she ___ him, ___ the secret, ___ by her husband, a voice that ___, who felt ___, ___ to the police, ___ their country, suspected of having ___, ___ the confidence, even when ___
• Select a cluster assignment c
for d in each view v (DPMM) i.e. words are assigned to clusters within each view v,d
• Select a view v for each observed feature, and generate it from c f
features distributed between views
vf,d
(LDA) i.e.
5
context
1−34 1−94
Cluster 1
2−0
View 3 2−47
Cluster 2
Figure 2: Topics with Senses: Shows top 20% of features for each view in a 3-view MVM fit to Google n-gram context data; different views place different mass on different sets of features. Cluster groupings within each view are shown. View 1 cluster 2 and View 3 cluster 1 both contain past-tense verbs, but only overlap on a subset of syntactic features.
Cluster 1
arbitrary characteristic comparative evolutionary fundamental inadequate inferior integral mystical poetic psychological radical singular systematic
View 2
austin betrayed charlotte conquered disappointed divorced embarked frustrated guarded hated jackson kent knocked murdered newcastle praised richmond secretly stationed stole summoned wounded
1−0
austin baltimore charlotte dallas pittsburgh richmond
0−77
kent liverpool manchester newcastle
0−10
arbitrary betrayed characteristic conquered disappointed divorced embarked evolutionary examine franklin frustrated fundamental guarded hated inadequate inferior integral jackson knocked likelihood murdered mystical poetic praised proportional radical secretly singular stationed stole summoned systematic wounded
View 1
secretly
Cluster 2
betrayed conquered disappointed divorced embarked frustrated guarded hated knocked murdered praised stationed stole summoned wounded
to an ___ side of the ___ first ___ of ___ of human the little ___ of ___ from the and an ___ written by ___ way of ___ real estate in ___ of ___ may hotels ___ hotels estate in ___ city of ___ welcome to ___ was the ___ of town of ___ to ___ a the city of ___ the ___ does not private message to ___ presence of ___ posted by ___ at name of ___ message to ___ located in ___ like ___ and in an ___ in ___ the hotels in ___ going to ___ from the ___ to dsl ___ dsl degree of ___ create a ___ by ___ to by ___ on born in ___ an ___ and ___ was born ___ said that ___ high school do not ___ you are ___ who is ___ which was ___ which ___ the were ___ in we are ___ was ___ to to be ___ and the more ___ so many ___ she was ___ of the ___ were of a ___ and near the ___ is also ___ i was ___ his ___ of could be ___ been ___ and be ___ or as ___ as and was ___ and is ___ and are ___ and ___ his also ___ the a more ___ ___ some of who are ___ were not ___ the very ___ the american ___ the ___ of that the ___ must be the ___ family that was ___ that ___ are posts by ___ of being ___ of ___ have might be ___ many ___ and is an ___ in these ___ he is ___ but the ___ of be ___ to are ___ to and ___ their along the ___ a kind of ___ ___ who had ___ open this result in ___ home page
Cluster 1
arbitrary austin baltimore characteristic comparative dallas evolutionary franklin fundamental inadequate inferior integral jackson kent likelihood liverpool mystical newcastle pittsburgh poetic proportional psychological radical richmond singular
0−0
context
1−34 1−94
Cluster 1
2−0
View 3 2−47
Cluster 2
Figure 2: Topics with Senses: Shows top 20% of features for each view in a 3-view MVM fit to Google n-gram context data; different views place different mass on different sets of features. Cluster groupings within each view are shown. View 1 cluster 2 and View 3 cluster 1 both contain past-tense verbs, but only overlap on a subset of syntactic features.
Cluster 1
arbitrary characteristic comparative evolutionary fundamental inadequate inferior integral mystical poetic psychological radical singular systematic
View 2
austin betrayed charlotte conquered disappointed divorced embarked frustrated guarded hated jackson kent knocked murdered newcastle praised richmond secretly stationed stole summoned wounded
1−0
austin baltimore charlotte dallas pittsburgh richmond
0−77
kent liverpool manchester newcastle
0−10
arbitrary betrayed characteristic conquered disappointed divorced embarked evolutionary examine franklin frustrated fundamental guarded hated inadequate inferior integral jackson knocked likelihood murdered mystical poetic praised proportional radical secretly singular stationed stole summoned systematic wounded
View 1
secretly
Cluster 2
betrayed conquered disappointed divorced embarked frustrated guarded hated knocked murdered praised stationed stole summoned wounded
to an ___ side of the ___ first ___ of ___ of human the little ___ of ___ from the and an ___ written by ___ way of ___ real estate in ___ of ___ may hotels ___ hotels estate in ___ city of ___ welcome to ___ was the ___ of town of ___ to ___ a the city of ___ the ___ does not private message to ___ presence of ___ posted by ___ at name of ___ message to ___ located in ___ like ___ and in an ___ in ___ the hotels in ___ going to ___ from the ___ to dsl ___ dsl degree of ___ create a ___ by ___ to by ___ on born in ___ an ___ and ___ was born ___ said that ___ high school do not ___ you are ___ who is ___ which was ___ which ___ the were ___ in we are ___ was ___ to to be ___ and the more ___ so many ___ she was ___ of the ___ were of a ___ and near the ___ is also ___ i was ___ his ___ of could be ___ been ___ and be ___ or as ___ as and was ___ and is ___ and are ___ and ___ his also ___ the a more ___ ___ some of who are ___ were not ___ the very ___ the american ___ the ___ of that the ___ must be the ___ family that was ___ that ___ are posts by ___ of being ___ of ___ have might be ___ many ___ and is an ___ in these ___ he is ___ but the ___ of be ___ to are ___ to and ___ their along the ___ a kind of ___ ___ who had ___ open this result in ___ home page
Cluster 1
arbitrary austin baltimore characteristic comparative dallas evolutionary franklin fundamental inadequate inferior integral jackson kent likelihood liverpool mystical newcastle pittsburgh poetic proportional psychological radical richmond singular
0−0
Data Austin
History of Austin, Texas, University of Texas Medical Branch, 1993 Pacific hurricane season, Rutherford B. Hayes, List of pipeline accidents, List of Austin City Limits performers, Texas in the American Civil War, 6th Cavalry Regiment (United States) ___ texas homes, ___ law school, the citizens of ___, the ___ business directory, ___ police department, university in ___, ___ vacation rentals, the ___ parks and, by the ___ business journal, coming to ___, the ___ area, deals on ___ hotels
Betrayed
Survivor: The Amazon, Personal life of Marcus Tullius Cicero, Numb3rs, Huns, Rurouni Kenshin, Liberation of Paris, The Knightly Tale of Gologras and Gawain, Territories in The Pendragon Adventure, A Storm of Swords, Connor MacLeod, Paul Atreides her manner ___, being ___ by their, ___ and murdered, ___ his weakness, she ___ him, ___ the secret, ___ by her husband, a voice that ___, who felt ___, ___ to the police, ___ their country, suspected of having ___, ___ the confidence, even when ___
• Word set: Top 43.7k words ranked by frequency in Wikipedia (ex top 1% as stop words)
• Syntax features: Contextual patterns from combined Google Web n-gram + Google Books n-gram corpus (3.5M features)
• Document features: Wikipedia article occurrence count (120k features)
Intrusion Task word
context
document
humor ingenuity delight advertisers astonishment
• “Model-based” lexical semantics: read word similarity directly from the model • Intruders are drawn from the top terms in other clusters • More robust than asking for numeric similarity judgements • Less inter-rater calibration required Chang et. al (2009)
Intrusion Task word
context
document
humor ingenuity delight advertisers astonishment
• “Model-based” lexical semantics: read word similarity directly from the model • Intruders are drawn from the top terms in other clusters • More robust than asking for numeric similarity judgements • Less inter-rater calibration required Chang et. al (2009)
Intrusion Task word
context
humor ingenuity delight advertisers astonishment
___ is characterized symptoms of ___ cases of ___ in cases of ___ real estate in ___
document
• “Model-based” lexical semantics: read word similarity directly from the model • Intruders are drawn from the top terms in other clusters • More robust than asking for numeric similarity judgements • Less inter-rater calibration required Chang et. al (2009)
Intrusion Task word
context
humor ingenuity delight advertisers astonishment
___ is characterized symptoms of ___ cases of ___ in cases of ___ real estate in ___
document
• “Model-based” lexical semantics: read word similarity directly from the model • Intruders are drawn from the top terms in other clusters • More robust than asking for numeric similarity judgements • Less inter-rater calibration required Chang et. al (2009)
Intrusion Task word
context
document
humor ingenuity delight advertisers astonishment
___ is characterized symptoms of ___ cases of ___ in cases of ___ real estate in ___
Puerto Rican cuisine Greek cuisine ThinkPad Palestinian cuisine Field ration
• “Model-based” lexical semantics: read word similarity directly from the model • Intruders are drawn from the top terms in other clusters • More robust than asking for numeric similarity judgements • Less inter-rater calibration required Chang et. al (2009)
Intrusion Task word
context
document
humor ingenuity delight advertisers astonishment
___ is characterized symptoms of ___ cases of ___ in cases of ___ real estate in ___
Puerto Rican cuisine Greek cuisine ThinkPad Palestinian cuisine Field ration
• “Model-based” lexical semantics: read word similarity directly from the model • Intruders are drawn from the top terms in other clusters • More robust than asking for numeric similarity judgements • Less inter-rater calibration required Chang et. al (2009)
Evaluation
• Amazon Mechanical Turk • 1256 unique raters (Country=US, >96%
User Comments U1 U2
approval)
• 5.7k unique intrusion tasks at 5x
U3
duplication: ~30k evaluations total
• 2736 rejected • Per-user average time for 50; “common”) context intrusion
word intrusion
DPMM−0.1−0.1 DPMM−0.1−0.01 LDA−50M−0.1−0.1 LDA−50M−0.1−0.01 LDA−100M−0.1−0.1 LDA−100M−0.1−0.01 LDA−200M−0.1−0.1 LDA−200M−0.1−0.01 LDA−300M−0.1−0.1 LDA−300M−0.1−0.01 LDA−500M−0.1−0.1 LDA−500M−0.1−0.01 LDA−1000M−0.1−0.1 LDA−1000M−0.1−0.01
●
●
0.2
0.4
0.6
0.8
●
●
●
MVM−3M−0.1−0.01 MVM−5M−0.1−0.01 MVM−5M−0.1−0.005 MVM−10M−0.1−0.01 MVM−10M−0.1−0.005 MVM−20M−0.1−0.01 MVM−30M−0.1−0.01 MVM−50M−0.1−0.01 MVM−100M−0.1−0.01
0.0
●
1.0
●
●
●
●
●
0.0
0.2
●
% correct (a) Syntax-only, common n-gram contexts.
●
●
0.8
1.0
●
0.4
0.6
1.0
l l a ver
●● ● ● ●● ●● ●
●
●
● ●
Syntax features only ●(freq>50; “common”) ● ●
●
●
0.5 ● ●
t c e r r o c
● ●
1.0
5 . 0 ● ● ● ●
●
●● ● ● ● ● ● ● ● ● ●
● ● ●●●
0.5
● ● ●
● ●
● ● ●●
● ● ● ●
●
● ●
●
● ●● ● ●
●
● ●
●
●● ● ●
●
● ●
ct
● ● 0.0 9 ● 3 . 0 ● i hope ● ● ● 2 2.5 3 0 . 0 l 10 10 10 e because d o m t 0 model size (clusters) . u 1 o of heads k m g o a c 5 n a e ● . t i a r t 0 d b d a )s model anonstr ● ● ● ● m ● n a e l h t (a) Syntax-only, common n-gram contexts. e ● d s e r d o , ● c ● o s ● y t a word i r m e ● xe high as 2.5 ●● ●
LDA MVM
● ● ●
●
● ●
n
0.0 1.0
● ● ●
●
● ●
trusio
%
●
●
●
●
3.5
10
3
●
in word
% correct
●
●
n
● ●
● ●
●● ● ●
word intrusion
m o c a t a l l a d r e 0.0 v d O n n a s a t n l h e m the ores t 1.0 uore c s s d l r e show ta yie0.33 ey a d s e t h n t e one that f o m 5 3 e . 0.5 0 s a e e v i and no t ela 0.46
●
● ●● ● ●
trusio
0.5
0.0 1.0
●
in ment
1.0
docu
●
context intrusion
3 3 . 0 ng ones. 5 3 . 0 o think-46 0. 9 3 . 0 ones to I don’t
usion
O
●
●10
)
●
●
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
% correct
Syntax features only (freq < 50; “rare”) (a) Syntax-only, common n-gram contexts.
context intrusion
word intrusion
DPMM−0.1−0.1 DPMM−0.1−0.01 LDA−50M−0.1−0.1 LDA−50M−0.1−0.01 LDA−100M−0.1−0.1 LDA−100M−0.1−0.01 LDA−200M−0.1−0.1 LDA−200M−0.1−0.01 LDA−300M−0.1−0.1 LDA−300M−0.1−0.01 LDA−500M−0.1−0.1 LDA−500M−0.1−0.01 LDA−1000M−0.1−0.1 LDA−1000M−0.1−0.01
●
●
●
●
●
●
MVM−3M−0.1−0.01 MVM−5M−0.1−0.01 MVM−5M−0.1−0.005 MVM−10M−0.1−0.01 MVM−10M−0.1−0.005 MVM−20M−0.1−0.01 MVM−30M−0.1−0.01 MVM−50M−0.1−0.01 MVM−100M−0.1−0.01
●
●
●
●
●
●
●
0.6
0.8
1.0
●
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
% correct (b) Syntax-only, rare n-gram contexts.
0.4
●(clusters) ● ●● model size ● ●
●
●
●●
●● ● ●
Syntax features only (freq < 50; “rare”) ● n-gram contexts. ● (a) Syntax-only, common ●
●
●
0.5 ●
t c e r r o c
● ●
●
1.0
●
● ● ●
0.5
●
● ●
● ● ●●
● ● ● ● ●
● ● ● ●
●
● ●
●●
●
●
● ●
●● ● ●
●
● ●
● ●
ct
● ● 0.0 9 ● 3 . 0 ● ● ● ● 2.4 1.8 2 2.2 2.6 2.8 3 3.2 0 . 0 l 10 10 10 10 10 10 10 10 e d o m t 0 model size (clusters) . u 1 o s k m g o a c e del ainndsdtraattain ● 5, 0.5 bmr0.01 ● o emon ● ● ● ● n a l h (b) Syntax-only, rare n-gram contexts. t relatively e ● d s e r d o , ● c ● o ● thyigher ass m ● 2.5 ● ●● ●
LDA MVM
● ● ●
●
●
●
●
●
●
●
n
0.0 1.0
● ● ●
● ●
trusio
%
●
● ●
● ● ● ● ● ● ● ●
●
0.5
● ●
● ● ●
●
in word
% correct
● ● ●
●●
word intrusion
e v 0.0 d O n n a s ment ores tha 1.0 c s s d l r e i 3 3 y . 0 a t a d s e t aken verbah t en f o 5 3 e . 0.5 0 s a e ssage board e v i lat 0.46
●●
n
m I could m o c rettydiffy ata rall
●
trusio
.●0 0 0.5 ● 1.0
●
in ment
6 4 . 0 9 3 . ould 0have
context intrusion
1.0
docu
m a.3 word 3 0 between 5 3 . 0 red to try
e task to be
●
1.0
r e v O
usion
because of headall
●
3.5
10
3
●10
)
●
●
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
% correct
“Common” syntax features + document features (b) Syntax-only, rare n-gram contexts.
context intrusion
document intrusion
DPMM−0.1−0.1 DPMM−0.1−0.01
●
LDA−50M−0.1−0.1 LDA−50M−0.1−0.01 LDA−100M−0.1−0.1 LDA−100M−0.1−0.01 LDA−200M−0.1−0.1 LDA−200M−0.1−0.01 LDA−300M−0.1−0.1 LDA−300M−0.1−0.01 LDA−500M−0.1−0.1 LDA−500M−0.1−0.01 LDA−1000M−0.1−0.1 LDA−1000M−0.1−0.01 MVM−3M−0.1−0.01 MVM−5M−0.1−0.01 MVM−5M−0.1−0.005 MVM−10M−0.1−0.01 MVM−10M−0.1−0.005 MVM−20M−0.1−0.01 MVM−30M−0.1−0.01 MVM−50M−0.1−0.01 MVM−100M−0.1−0.01
word intrusion ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0
0.2
●
●
●
●
●
●
0.2
0.4
●
0.4
0.6
0.8
1.0 0.0
0.2
0.4
0.6
0.8
1.0 0.0
% correct (c) Syntax+Documents, common n-gram contexts.
0.6
0.8
1.0
usion
1.0
all
●
●● ● ● ●● ●● ●
●
● ●
●
●● ●
●
●
● ● ●
● ●
0.5
●
● ●
●
●
● ● ●●
● ● ●
●
●
● ● ●● ● ●● ● ● ●
● ●
0.0 1.0
0.5 2
● ●●
●
●
●
●
● ●
●● ● ●
●
● ●● ●
●
●● ● ● ●
● ●
●
●
●●
●
●
● ●
2.5
●
●
●
3.5
10
3
3
●10
)
●
●
in
●3.5 10 10 10 ● 10 ● ● ● model size ● (clusters) ● ●● ● 2.5 ●
ct
MVM
● ●
document
0.0 1.0
●
usion
● ●● ● ●
● ●
LDA
n
l e d o m t u o we obtain similar res k 0.0 m g o alessl avariable c n a t i a t d d a n r de monst across n a e sco(Figure l h t e dcomplexity s e r d o her as m
●
● ● ●
0.5 ● ●
0.5
●
● ●
●
● ●
trusio
0.0 1.0
●
●
● ● ● ● ● ●
●
in word
5 . 0 0.5
● ●
● ●
●
●
context intr
1.0
●
●
n
● ●
●
●
●
●
●
0.0 1.0
● ●
trusio
0.0 1.0
●
●●
word intrusion
(a) breaks out 46 0.model exity, demonstrating 9 3 . 0 e over LDA as model
1.0
document intrusion
m o s model and data comc a t a l l a d r e v O e higher s an than t res thscores Documents data yields o c sthe relativeeease s d l of the i 3 3 y . 0 a t a d e h t f o 5 3 e . 0 s a e e
0.5
in ment
0.33 0.35 0.46 0.39
●
t c e % correct r r o c %
Overall
●
docu
“Common” syntax●features + document features ● ●
●
context intrusion
33 ocuments 35 0 .946 9 .39 03
●
Conclusion
• Introduced a latent variable model accounting for cross-cutting / multiple clustering structure in word meaning
• Large-scale human evaluation of the semantic coherence of similarity predictions
• Significantly higher precision intrusion identification than related model-based approaches
• Even for fine-grained clusterings 25