Slides

5 downloads 112642 Views 2MB Size Report
Cristian Danescu-Niculescu-Mizil, Lillian Lee, Jure Leskovec,. Cameron Marlow ... What features of a message help predict its level of penetration? How can ... More extensive analysis by Simmons-Adamic-Adar 2011. Genetic ... I hope if my data is in your cache from AOL that it does not screw up your research too much.
Computational Perspectives on Social Phenomena in On-Line Networks Jon Kleinberg Cornell University

Including joint work with Lars Backstrom, Justin Cheng, Cristian Danescu-Niculescu-Mizil, Lillian Lee, Jure Leskovec, Cameron Marlow, and Johan Ugander.

Two Metaphors for the Web

The Web is balanced between two metaphors. The library: knowledge, pages, hyperlinks, associations. The crowd: real-time awareness, memes, contagion.

Networks of Documents, Networks of People Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them ... There is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record. — Vannevar Bush, As We May Think, 1945

Networks of Documents, Networks of People Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them ... There is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record. — Vannevar Bush, As We May Think, 1945

.... radio and the printed page seemed to have only negligible effects on actual vote decisions .... When [people] were asked what had contributed to their decision, their answer was: other people. — Elihu Katz and Paul Lazarsfeld, . Personal Influence, 1955

Other People

Anti-war chain letter (LibenNowell-Kleinberg 2008)

Facebook meme (Adamic et al. 2012)

Tracking memes through social media Links via blog posts [Adar et al 2004, Gruhl et al 2004] Product recommendations [Leskovec et al 2006] Chain-letter petitions [LibenNowell-Kleinberg 2008, Golub-Jackson 2010] Facebook copy-paste and photo memes [Adamic et al 2012, Dow et al 2013] Quotes in news, blogs [Leskovec-Backstrom-Kleinberg 2009, Suen et al 2013]

Duality

Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them ...

Anti-war chain letter (LibenNowell-Kleinberg 2008)

A duality between browsing and meme dynamics: A person blazing trails through a network of documents vs. A document blazing trails through a network of people.

Core Questions Combining Content and Structure

Anti-war chain letter (LibenNowell-Kleinberg 2008)

Facebook meme (Adamic et al. 2012)

Basic questions about the flow of information through populations What features of a message help predict its level of penetration? How can network structure help understand contagion of content? Combining content and structure: curating a user’s interactions.

Tightly connected to dynamics of news, social media, sharing.

Quoted Phrases: The 2008 Election News Cycle

Leskovec-Backstrom-Kleinberg 2009

Why Do Certain Quotes Stand Out?

Each of these quotes was a highly-reported fragment of a longer piece of text. “You can put lipstick on a pig, but it’s still a pig ... You can wrap an old fish in a piece of paper called change, but it’s still going to stink.”

Can we find a dataset with many instances of this phenomenon? Want a domain where we can control for setting as language varies.

Movie Quotes as Viral Text

Viral Text Strategy: Compare language of memorable movie lines to non-memorable lines of same length, spoken by same character in same scene of same movie. [DanescuNiculescuMizil-Cheng-Kleinberg-Lee 2012]

Stormtrooper: Let me see your identification. Obi-Wan: You don’t need to see his identification. Stormtrooper: We don’t need to see his identification. Obi-Wan: These aren’t the droids you’re looking for. Stormtrooper: These aren’t the droids we’re looking for. Obi-Wan: He can go about his business. Stormtrooper: You can go about your business.

Set-Up Compare language of memorable movie lines to non-memorable lines of same length, spoken by same character in same scene of same movie. Using approx 1000 full scripts, memorability evaluated based on both IMDb and Google/Bing counts. Present a person or algorithm with such pairs (mem,non-mem) and see how accurately they can be distinguished.

Examples: Movie Jackie Brown Star Trek: Nemesis Ordinary People

First Quote Half a million dollars will always be missed. I think it’s time to try some unsafe velocities. A little advice about feelings kiddo; don’t expect it always to tickle.

Second Quote I know the type, trust me on this. No cold feet, or any other parts of our anatomy. I mean there’s someone besides your mother you’ve got to forgive.

Human performance: Approx. 15,000 evaluators, movies they claim not to have seen. Modal success rate: 9/12.

Algorithmic Recognition of Memorability

In aggregate, memorable quotes are more “distinctive”: Less probable in their word choices. Compare to a base language model trained on newswire. Not just individual words, but consecutive 2- and 3-word sequences. E.g. “You had me at hello.”

But more probable in their part-of-speech composition. “You had me at hello” has same part-of-speech sequence as “I met him in Boston.”

A memorable quote: a sequence of unusual words built on a scaffolding of common part-of-speech patterns.

Connective Media and Social Feedback Effects In aggregate, memorable quotes are more “general.” More present tense, more indefinite articles, fewer third-person pronouns. “You’re gonna need a bigger boat,” not “You’re gonna need the bigger boat.” Suggests a certain portability to the quote.

Accuracy: human  our best algorithms  chance prediction. Machine learning: ∼ 64% via model trained on on 52 features based on distinctiveness and generality. Machine learning: < 60% accurate with large bag-of-words model.

Meme Mutation

Mutation of textual memes as they travel from source to source. Used for phrase clustering in Leskovec-Backstrom-Kleinberg 2009 More extensive analysis by Simmons-Adamic-Adar 2011

Genetic analogy for memes: beginning of a formalization? Fitness functions Mutation mechanisms and “functional” elements Population structure

One person’s interface with the larger network

Marlow-Byron-Lento-Rosenn 2009

Decisions in Social Media x

y

z

Your interface with the network plays a role in your decisions User-defined groups in on-line communities; participation in on-line collaborative projects; decision to use a hashtag on Twitter; decision to click on a product ad endorsed by friends; ...

Does set/structure of adopting neighbors help predict tendency to adopt? Backstrom et al. 2006, Crandall et al. 2008, Romero et al. 2011

Diffusion Curves Long-standing framework: probability of adopting a behavior depends on number of network neighbors already adopting. [Bass 1969, Granovetter 1978, Schelling 1978]

Prob. of adoption

Prob. of adoption

k = number of friends adopting

Key issue: qualitative shape of the curves. Diminishing returns? Critical mass?

k = number of friends adopting

Diffusion Curves Probability of joining a community when k friends are already members 0.025

0.02

probability

0.015

0.01

0.005

0 0

5

10

15

20

25 k

30

35

40

45

50

(a) Joining a LiveJournal group [Backstrom et al. 06]

(b) Editing a Wikipedia article [Crandall et al. 08]

(c) Purchasing a product. [Leskovec et al 06]

Prediction and Potential Influence You’re more likely to do something when more friends are doing it. Why is that? The issue of homophily/selection vs. influence [Cohen 77, Kandel 78, Manski 93, Aral et al. 09, Shalizi-Thomas 11]

An experiment to sort out these effects [Bakshy-Eckles-Yan-Rosenn 2012]

Structural Diversity Dependence on number of friends: a first step toward general prediction. Given the full pattern of connections among your friends, estimate probability of adopting a new behavior.

Structural diversity [Ugander-Backstrom-Marlow-Kleinberg]

Data from invitations to join Facebook.

x

y

z

Structural Diversity With four neighbors:

Final Reflections

Dear Professor Kleinberg: Reading this article gave me a hearty laugh. I am an AOL user and the thought of anyone compiling a profile of me based on my online searches delighted me. I am a senior citizen and live in senior housing. I am one of two people in the building who own and know how to use a computer for research. For this reason, I am kept busy researching topics for other residents. My Grandchildren all come visit at various times and use my computer. ... If one looked at my search history, they would conclude that I have multiple diseases and medical conditions, have interests that span the trove of human knowledge, and am of indeterminate gender. I could go on and on, but I believe you get the gist of what I am saying. ... I hope if my data is in your cache from AOL that it does not screw up your research too much. —AOL user, 23 Aug 2006

Final Reflections The Web is powered by feedback loops between people and information. Deeper models for the ways in which people and information move through the system? Richer genetic analogies for memes, and analysis of the ecosystem around them? New computational ideas will be needed to address all these challenges