Content-Based SN - CiteSeerX

3 downloads 0 Views 2MB Size Report
Aug 10, 2010 - John Dewey, philosopher & educator mark up language. Solution: .... McMillan, D.W. & Chavis, D.M. (1986). Sense of community: A definition ...
Automated Discovery and Analysis of Online Social Networks Anatoliy Gruzd, PhD Assistant Professor School of Information Management Dalhousie University

E-mail: [email protected] Homepage: AnatoliyGruzd.com Twitter: dalprof

Vancouver, Canada [email protected]

August 10,Networks 2010 Online Social

Outline • Why Do We Want To Discover Online Social Networks?

• How Do We Collect Information About Social Networks?

• Netlytic.org - Online Tool for Social Network Discovery

[email protected]

2

Why Do We Want To Discover Online Social Networks? • Nodes = People • Edges /Ties = Relations

© kelleyw

[email protected]

3

Why Do We Want To Discover Online Social Networks?



Users – More useful recommendation systems • Amazon, Netflix • Social information filtering in location-based systems (Espinoza et al, 2001)

– Improve users’ experience with information systems • Keeping in touch with friends and colleagues (e.g., LinkedIn, Facebook) • New browsing capabilities for news stories (Pouliquen et al, 2007; Tanev, 2007)

– A more secured/easy way to share private content with trusted individuals • “Web of Trust” (Golbeck, 2008; Matsuo et.al., 2004)

[email protected]

4

Why Do We Want To Discover Online Social Networks?



Companies – Recruiting talents • Different ties for different needs (Leung, 2003)

– Finding experts • Expertise oriented searching using social networks (Ehrlich et al, 2007; Li et al, 2007)

– Marketing • Viral marketing (Domingos, 2005) • Building brand loyalty using customer networks (Thompson & Sinha, 2008)

[email protected]

5

Why Do We Want To Discover Online Social Networks? • Researchers – Ability to ask and answer deeper questions about the nature and operation of online communities • How and why one online community emerges and another dies? • How people agree on common practices and rules in an online community? • How knowledge and information is shared among group members?

[email protected]

Online Social Networks

6

Outline • Why Do We Want To Discover Online Social Networks?

• How Do We Collect Information About Social Networks?

• Netlytic.org - Online Tool for Social Network Discovery

[email protected]

7

How Do We Collect Information About Social Networks? • Common approach: surveys or interviews • A sample question about students’ perceived social structures (based on C. Haythornthwaite’s 1999 LEEP study protocol) With [5] indicating a closer relationship, please indicate on a scale from [1] to [5], YOUR FRIENDSHIP RELATIONSHIP WITH EACH STUDENT IN THE CLASS

[1] - don’t know this person [2] - just another member of class [3] - a slight friendship [4] - a friend [5] - a close friend Alice D.

[1] [2] [3] [4] [5]

… Richard S. [email protected]

[1] [2] [3] [4] [5] 8

LimeSurvey (open source) – www.limesurvey.org (It has conditional questions, templates; management)

VENNMAKER http://www.vennmaker.com/en

Using VennMaker to Collect an Ego Network

How Do We Collect Information About Social Networks? • Common approach: surveys or interviews – Disadvantages • time-consuming • resource-intensive • sensitive questions – Subjectivity (Dillman, 2000) • give partial answers • forget people and interactions • perceive events and relationships differently • Research Goal – Use computers to discover online social networks automatically

[email protected]

12

How Do We Collect Information About Social Networks? Different Types of Online Social Networks • Email networks • Forum networks • Blog networks • Co-author/Co-citation networks • Friends’ networks on MySpace, Facebook, etc • Networks of like-minded people on

[email protected]

13

http://www.visualcomplexity.com/vc

http://infochimps.org/datasets

Twitter Networks

Twitter network dataset of 41.7M users http://an.kaist.ac.kr/traces/WWW2010.html

API – Application Programming Interface http://www.ProgrammableWeb.com/apis

Co-authorship/co-citation networks

by Dawn Endico

(1) Search Engine APIs Google API – Gruzd *** Twidale (within 1-3 words)

Google Scholar – author:"J Geller" author:Misra – or inauthor in Google.books

Amazon API

(2) Database APIs

http://academic.research.microsoft.com Interactive Ego Network

Discovering Networks in the Blogosphere http://presidentialwatch08.com

[email protected]

Online Social Networks

22

Options for Automated Discovery of Communication Networks among Blogs Furukawa, et.al. (2007), Ali-Hasan & Adamic (2007): – – – –

Blogroll links Citation links Comment links Trackback

Disruptive Technologies in Health Information Landscapes

23

Options for Automated Discovery of Communication Networks among Blogs Furukawa, et.al. (2007), Ali-Hasan & Adamic (2007): – – – –

Blogroll links Citation links Comment links Trackback

Disruptive Technologies in Health Information Landscapes

24

Options for Automated Discovery of Communication Networks among Blogs Furukawa, et.al. (2007), Ali-Hasan & Adamic (2007): – – – –

Blogroll links Citation links Comment links Trackback

Disruptive Technologies in Health Information Landscapes

25

Options for Automated Discovery of Communication Networks among Blogs Furukawa, et.al. (2007), Ali-Hasan & Adamic (2007): – – – –

Blogroll links Citation links Comment links Trackback

Disruptive Technologies in Health Information Landscapes

26

Possible Sources to Collect Blog Data Advantages RSS feed  Expansive coverage  Date filter  Search operators: link, site, inTitle, inPostTitle, inPostAuthor 



Topic filter



Authority-based ranking

Date filter  Search operators: inTitle, InUrl, Last, Inlink  Expert-based ranking  Good spam control 

Date filter  Search operators: Title, Author, Tag  Blog Trends feature 

Indexes Posts and Comments  RSS feed 

Disadvantages Includes not only blogs  Limits # of requests per hour  Provides only first 200 chars of the post 

Limited coverage  New API is currently under construction 





RSS feed only shows 10 top results

Commercial web API

Commercial web API  No search capabilities 

Networks derived from Conversational Data

How Do We Collect Information About Social Networks? Research Questions How to discover social networks from text-based online interactions automatically? What content-based features of online interactions can help to uncover nodes and ties between group members?

[email protected]

29

Automated Discovery of Social Networks Emails

• Nodes = People

Nick

• Ties = “Who talks to whom”

Rick

• Tie strength = The number of messages exchanged between individuals

Dick

[email protected]

30

Automated Discovery of Social Networks Example: Enron Corporation email dataset •

The dataset was made public by the US government during its investigation of the Enron Corporation financial collapse



Diesner & Carley (2005) used ‘who talks to whom’ networks extracted from the Enron emails to compare employees’ communication patterns before and during the company’s collapse



Communication networks during the collapse did not reflect Enron’s formal organizational structures - e.g., top executives formed a tight clique and interacted less often with other employees

[email protected]

31

CMU ORA software http://www.casos.cs.cmu.edu/projects/ora

Automated Discovery of Social Networks “Many to Many” Communication

Forum

[email protected]

Mailing listserv

Chat

Comments

33

Automated Discovery of Social Networks Approach 1: Chain Network (Reply-to) Source

Posting Header

Method

Connects a sender to the previous poster in the thread

Discovered Tie(s)

Sam -> Gabriel

Posting header

FROM: Sam PREVIOUS POSTER: Gabriel

Content

“ Nick, Gina and Gabriel: I apologize for not backing this up with a good source, but I know from reading about this topic that … ” Possible Missing Connections: • Sam -> Nick • Sam -> Gina

[email protected]

• Nick Gina

34

Automated Discovery of Social Networks Approach 2: Name Network This approach looks for personal names in the content of the messages to identify social connections between group members. FROM: Ann “Steve and Natasha, I couldn't wait to see your site. I knew it was going to [be] awesome!”

Method

Connect the sender to people mentioned in the message

Connect people whose names co-occur in the same message(s)

Discovered Tie(s)

Ann -> Steve Ann -> Natasha

Steve Natasha

[email protected]

35

Automated Discovery of Social Networks Approach 2: Name Network • Main Communicative Functions of Personal Names (Leech, 1999) – getting attention and identifying addressee – maintaining and reinforcing social relationships

• Names are “one of the few textual carriers of identity” in discussions on the web (Doherty, 2004) • Their use is crucial for the creation and maintenance of a sense of community (Ubon, 2005)

[email protected]

36

Automated Discovery of Social Networks Approach 2: Name Network Step 1. Automatically find all personal names in the postings •

Compare each word from the posting against a dictionary of all names collected from the US Census data



Find names that are NOT in the name dictionary (e.g., international names, informal names and nicknames) using contextual and structural information about words such as – Capitalization – Context words – Position in text

[email protected]

37

Automated Discovery of Social Networks Approach 2: Name Network Step 1. Automatically find all personal names in the postings • Challenges when processing online conversations – Incorrect spelling, partial sentences, inventive punctuation – Or local language conventions • Acronyms, group naming conventions, group word use conventions, nicknames for people and processes

[email protected]

38

Automated Discovery of Social Networks Approach 2: Name Network Step 2. Connect a sender of the posting to all names discovered in the previous step EXAMPLE From: [email protected] (= Wilma) Reference Chain: [email protected], [email protected]

Hi Dustin, Sam and all, I appreciate your posts from this and last week […]. I keep thinking of poor Charlie who only wanted information on “dogs“. […] Cheers, Wilma.

Wilma – Dustin Wilma – Sam Wilma – Charlie Dustin – Sam – Charlie

[email protected]

Challenges to overcome: – One person can have many names – Many people can have the same name – Names can belong to students in the class and outsiders

39

Automated Discovery of Social Networks Name Network Method: Challenges Kurt Cobain, a lead singer for the rock band Nirvana chris is not a group member Santa Monica Public Library John Dewey, philosopher & educator mark up language

Solution: - Name alias resolution [email protected]

40

Evaluating Name Networks Example: Youtube comments Chain Network

Name Network

Chain Network

Name Network

(less connections)

(more connections)

[email protected]

41

Evaluating Name Networks Results from Online Learners Dataset Chain Network

Name Network

Structurally, the name and self-reported networks are far more similar.

Gruzd, A. (2009). Automated Discovery of Social Networks to Study Collaborative Learning. Journal of Education for Library and Information Science 50(4): 243-253

Outline • Why Do We Want To Discover Online Social Networks?

• How Do We Collect Information About Social Networks?

• Netlytic.org - Online Tool for Social Network Discovery

[email protected]

43

http://netlytic.org

[email protected]

44

Practical Part http://netlytic.org/mitacs

Netlytic.org Name networks dataset types

Sender known

Reply-to / Subject Line

Email/threaded discussion

Blog/news

Chat/comments

name co-occurrence only

Online Social Networking for Scholars • Helping researchers to connect to their colleagues – Academia.edu – ResearchGate.net – Academici.com

• Reviews @ http://SocialMediaLab.ca

References •

Chin, A. and M. Chignell (2007). "Identifying Communities in Blogs: Roles for Social Network Analysis and Survey Instruments." International Journal of Web Based Communities 3(3): 343-365



Domingos, P. (2005). "Mining social networks for viral marketing." IEEE Intelligent Systems 20(1): 80-82.



Ehrlich, K., C.-Y. Lin, et al. (2007). Searching for experts in the enterprise: combining text and social network analysis. Proceedings of the 2007 international ACM conference on Supporting group work. Sanibel Island, Florida, USA, ACM.



Espinoza, F., P. Persson, et al. (2001). GeoNotes : Social and Navigational Aspects of Location-Based Information Systems. Ubicomp 2001: Ubiquitous Computing: 2-17.



Fisher, D., D. Fisher, et al. (2006). You Are Who You Talk To: Detecting Roles in Usenet Newsgroups. System Sciences, 2006. HICSS '06. Proceedings of the 39th Annual Hawaii International Conference on.



Golbeck, J. (2008). "Trust and Nuanced Profile Similarity in Online Social Networks." ACM Transactions on the Web.



Golbeck, J. (2008). "Trust and Nuanced Profile Similarity in Online Social Networks." ACM Transactions on the Web.



Haythornthwaite, C. (2006). "Facilitating collaboration in online learning." Journal of Asynchronous Learning Networks 10(1): 7-24.



Jones, Q. (1997). Virtual Communities, Virtual Settlements And Cyber-Archaeology. Journal of Computer Mediated Communication 3(3).



Leech, G. (1999). The Distribution and Function of Vocatives in American and British English Conversation. In H. Hasselggård and S. Oksefjell (Eds.) Out of Corpora: Studies in Honour of Stig Johansson. Amsterdam/Atlanta, GA: Rodopi.



Leggatt, H. (2007, April 12). Spam Volume to Exceed Legitimate Emails in 2007. BizReport : Email Marketing. Retrieved October 30, 2008, from http://www.bizreport.com/2007/04/spam_volume_to_exceed_legitimate_emails_in_2007.html

[email protected]

48

References (cont.) •

Leung, A. (2003). "Different ties for different needs: Recruitment practices of entrepreneurial firms at different developmental phases." Human Resource Management 42(4).



Li, J., J. Tang, et al. (2007). EOS: expertise oriented search using social networks. Proceedings of the 16th international conference on World Wide Web. Banff, Alberta, Canada, ACM Press.



Matsuo, Y., H. Tomobe, et al. (2004). Finding Social Network for Trust Calculation. the 16th European Conference on Artificial Intelligence (ECAI2004).



Matsuo, Y., H. Tomobe, et al. (2004). Finding Social Network for Trust Calculation. the 16th European Conference on Artificial Intelligence (ECAI2004) 16: 510.



McMillan, D.W. & Chavis, D.M. (1986). Sense of community: A definition and theory. Journal of Community Psychology 14(1): 6-23.



Pouliquen, B., R. Steinberger, et al. (2007). Multilingual multi-document continuously-updated social networks. Proceedings of the Workshop Multi-source Multilingual Information Extraction and Summarization (MMIES'2007) held at RANLP'2007. Borovets, Bulgaria.



Savignon, S.J. and Roithmeier, W. (2004). Computer-Mediated Communication: Texts and Strategies. Computer Assisted Language Instruction Consortium Journal 21(2): 265-290.



Swearingen, J. (2008). Four Ways Social Networking Can Build Business. Bnet.com. Retrieved from http://www.bnet.com/2403-13070_23-219914.html



Tanev, H. (2007). Unsupervised Learning of Social Networks from a Multiple-Source News Corpus. Workshop Multi-source Multilingual Information Extraction and Summarization (MMIES'2007) held at RANLP'2007. Borovets, Bulgaria.



Thompson, S. A. and R. K. Sinha (2008). "Brand Communities and New Product Adoption:The Influence and Limits of Oppositional Loyalty." Journal of Marketing 72(6): 65-80.

[email protected]

49