Automated Discovery and Analysis of Online Social Networks Anatoliy Gruzd, PhD Assistant Professor School of Information Management Dalhousie University
E-mail:
[email protected] Homepage: AnatoliyGruzd.com Twitter: dalprof
Vancouver, Canada
[email protected]
August 10,Networks 2010 Online Social
Outline • Why Do We Want To Discover Online Social Networks?
• How Do We Collect Information About Social Networks?
• Netlytic.org - Online Tool for Social Network Discovery
[email protected]
2
Why Do We Want To Discover Online Social Networks? • Nodes = People • Edges /Ties = Relations
© kelleyw
[email protected]
3
Why Do We Want To Discover Online Social Networks?
•
Users – More useful recommendation systems • Amazon, Netflix • Social information filtering in location-based systems (Espinoza et al, 2001)
– Improve users’ experience with information systems • Keeping in touch with friends and colleagues (e.g., LinkedIn, Facebook) • New browsing capabilities for news stories (Pouliquen et al, 2007; Tanev, 2007)
– A more secured/easy way to share private content with trusted individuals • “Web of Trust” (Golbeck, 2008; Matsuo et.al., 2004)
[email protected]
4
Why Do We Want To Discover Online Social Networks?
•
Companies – Recruiting talents • Different ties for different needs (Leung, 2003)
– Finding experts • Expertise oriented searching using social networks (Ehrlich et al, 2007; Li et al, 2007)
– Marketing • Viral marketing (Domingos, 2005) • Building brand loyalty using customer networks (Thompson & Sinha, 2008)
[email protected]
5
Why Do We Want To Discover Online Social Networks? • Researchers – Ability to ask and answer deeper questions about the nature and operation of online communities • How and why one online community emerges and another dies? • How people agree on common practices and rules in an online community? • How knowledge and information is shared among group members?
[email protected]
Online Social Networks
6
Outline • Why Do We Want To Discover Online Social Networks?
• How Do We Collect Information About Social Networks?
• Netlytic.org - Online Tool for Social Network Discovery
[email protected]
7
How Do We Collect Information About Social Networks? • Common approach: surveys or interviews • A sample question about students’ perceived social structures (based on C. Haythornthwaite’s 1999 LEEP study protocol) With [5] indicating a closer relationship, please indicate on a scale from [1] to [5], YOUR FRIENDSHIP RELATIONSHIP WITH EACH STUDENT IN THE CLASS
[1] - don’t know this person [2] - just another member of class [3] - a slight friendship [4] - a friend [5] - a close friend Alice D.
[1] [2] [3] [4] [5]
… Richard S.
[email protected]
[1] [2] [3] [4] [5] 8
LimeSurvey (open source) – www.limesurvey.org (It has conditional questions, templates; management)
VENNMAKER http://www.vennmaker.com/en
Using VennMaker to Collect an Ego Network
How Do We Collect Information About Social Networks? • Common approach: surveys or interviews – Disadvantages • time-consuming • resource-intensive • sensitive questions – Subjectivity (Dillman, 2000) • give partial answers • forget people and interactions • perceive events and relationships differently • Research Goal – Use computers to discover online social networks automatically
[email protected]
12
How Do We Collect Information About Social Networks? Different Types of Online Social Networks • Email networks • Forum networks • Blog networks • Co-author/Co-citation networks • Friends’ networks on MySpace, Facebook, etc • Networks of like-minded people on
[email protected]
13
http://www.visualcomplexity.com/vc
http://infochimps.org/datasets
Twitter Networks
Twitter network dataset of 41.7M users http://an.kaist.ac.kr/traces/WWW2010.html
API – Application Programming Interface http://www.ProgrammableWeb.com/apis
Co-authorship/co-citation networks
by Dawn Endico
(1) Search Engine APIs Google API – Gruzd *** Twidale (within 1-3 words)
Google Scholar – author:"J Geller" author:Misra – or inauthor in Google.books
Amazon API
(2) Database APIs
http://academic.research.microsoft.com Interactive Ego Network
Discovering Networks in the Blogosphere http://presidentialwatch08.com
[email protected]
Online Social Networks
22
Options for Automated Discovery of Communication Networks among Blogs Furukawa, et.al. (2007), Ali-Hasan & Adamic (2007): – – – –
Blogroll links Citation links Comment links Trackback
Disruptive Technologies in Health Information Landscapes
23
Options for Automated Discovery of Communication Networks among Blogs Furukawa, et.al. (2007), Ali-Hasan & Adamic (2007): – – – –
Blogroll links Citation links Comment links Trackback
Disruptive Technologies in Health Information Landscapes
24
Options for Automated Discovery of Communication Networks among Blogs Furukawa, et.al. (2007), Ali-Hasan & Adamic (2007): – – – –
Blogroll links Citation links Comment links Trackback
Disruptive Technologies in Health Information Landscapes
25
Options for Automated Discovery of Communication Networks among Blogs Furukawa, et.al. (2007), Ali-Hasan & Adamic (2007): – – – –
Blogroll links Citation links Comment links Trackback
Disruptive Technologies in Health Information Landscapes
26
Possible Sources to Collect Blog Data Advantages RSS feed Expansive coverage Date filter Search operators: link, site, inTitle, inPostTitle, inPostAuthor
Topic filter
Authority-based ranking
Date filter Search operators: inTitle, InUrl, Last, Inlink Expert-based ranking Good spam control
Date filter Search operators: Title, Author, Tag Blog Trends feature
Indexes Posts and Comments RSS feed
Disadvantages Includes not only blogs Limits # of requests per hour Provides only first 200 chars of the post
Limited coverage New API is currently under construction
RSS feed only shows 10 top results
Commercial web API
Commercial web API No search capabilities
Networks derived from Conversational Data
How Do We Collect Information About Social Networks? Research Questions How to discover social networks from text-based online interactions automatically? What content-based features of online interactions can help to uncover nodes and ties between group members?
[email protected]
29
Automated Discovery of Social Networks Emails
• Nodes = People
Nick
• Ties = “Who talks to whom”
Rick
• Tie strength = The number of messages exchanged between individuals
Dick
[email protected]
30
Automated Discovery of Social Networks Example: Enron Corporation email dataset •
The dataset was made public by the US government during its investigation of the Enron Corporation financial collapse
•
Diesner & Carley (2005) used ‘who talks to whom’ networks extracted from the Enron emails to compare employees’ communication patterns before and during the company’s collapse
•
Communication networks during the collapse did not reflect Enron’s formal organizational structures - e.g., top executives formed a tight clique and interacted less often with other employees
[email protected]
31
CMU ORA software http://www.casos.cs.cmu.edu/projects/ora
Automated Discovery of Social Networks “Many to Many” Communication
Forum
[email protected]
Mailing listserv
Chat
Comments
33
Automated Discovery of Social Networks Approach 1: Chain Network (Reply-to) Source
Posting Header
Method
Connects a sender to the previous poster in the thread
Discovered Tie(s)
Sam -> Gabriel
Posting header
FROM: Sam PREVIOUS POSTER: Gabriel
Content
“ Nick, Gina and Gabriel: I apologize for not backing this up with a good source, but I know from reading about this topic that … ” Possible Missing Connections: • Sam -> Nick • Sam -> Gina
[email protected]
• Nick Gina
34
Automated Discovery of Social Networks Approach 2: Name Network This approach looks for personal names in the content of the messages to identify social connections between group members. FROM: Ann “Steve and Natasha, I couldn't wait to see your site. I knew it was going to [be] awesome!”
Method
Connect the sender to people mentioned in the message
Connect people whose names co-occur in the same message(s)
Discovered Tie(s)
Ann -> Steve Ann -> Natasha
Steve Natasha
[email protected]
35
Automated Discovery of Social Networks Approach 2: Name Network • Main Communicative Functions of Personal Names (Leech, 1999) – getting attention and identifying addressee – maintaining and reinforcing social relationships
• Names are “one of the few textual carriers of identity” in discussions on the web (Doherty, 2004) • Their use is crucial for the creation and maintenance of a sense of community (Ubon, 2005)
[email protected]
36
Automated Discovery of Social Networks Approach 2: Name Network Step 1. Automatically find all personal names in the postings •
Compare each word from the posting against a dictionary of all names collected from the US Census data
•
Find names that are NOT in the name dictionary (e.g., international names, informal names and nicknames) using contextual and structural information about words such as – Capitalization – Context words – Position in text
[email protected]
37
Automated Discovery of Social Networks Approach 2: Name Network Step 1. Automatically find all personal names in the postings • Challenges when processing online conversations – Incorrect spelling, partial sentences, inventive punctuation – Or local language conventions • Acronyms, group naming conventions, group word use conventions, nicknames for people and processes
[email protected]
38
Automated Discovery of Social Networks Approach 2: Name Network Step 2. Connect a sender of the posting to all names discovered in the previous step EXAMPLE From:
[email protected] (= Wilma) Reference Chain:
[email protected],
[email protected]
Hi Dustin, Sam and all, I appreciate your posts from this and last week […]. I keep thinking of poor Charlie who only wanted information on “dogs“. […] Cheers, Wilma.
Wilma – Dustin Wilma – Sam Wilma – Charlie Dustin – Sam – Charlie
[email protected]
Challenges to overcome: – One person can have many names – Many people can have the same name – Names can belong to students in the class and outsiders
39
Automated Discovery of Social Networks Name Network Method: Challenges Kurt Cobain, a lead singer for the rock band Nirvana chris is not a group member Santa Monica Public Library John Dewey, philosopher & educator mark up language
Solution: - Name alias resolution
[email protected]
40
Evaluating Name Networks Example: Youtube comments Chain Network
Name Network
Chain Network
Name Network
(less connections)
(more connections)
[email protected]
41
Evaluating Name Networks Results from Online Learners Dataset Chain Network
Name Network
Structurally, the name and self-reported networks are far more similar.
Gruzd, A. (2009). Automated Discovery of Social Networks to Study Collaborative Learning. Journal of Education for Library and Information Science 50(4): 243-253
Outline • Why Do We Want To Discover Online Social Networks?
• How Do We Collect Information About Social Networks?
• Netlytic.org - Online Tool for Social Network Discovery
[email protected]
43
http://netlytic.org
[email protected]
44
Practical Part http://netlytic.org/mitacs
Netlytic.org Name networks dataset types
Sender known
Reply-to / Subject Line
Email/threaded discussion
Blog/news
Chat/comments
name co-occurrence only
Online Social Networking for Scholars • Helping researchers to connect to their colleagues – Academia.edu – ResearchGate.net – Academici.com
• Reviews @ http://SocialMediaLab.ca
References •
Chin, A. and M. Chignell (2007). "Identifying Communities in Blogs: Roles for Social Network Analysis and Survey Instruments." International Journal of Web Based Communities 3(3): 343-365
•
Domingos, P. (2005). "Mining social networks for viral marketing." IEEE Intelligent Systems 20(1): 80-82.
•
Ehrlich, K., C.-Y. Lin, et al. (2007). Searching for experts in the enterprise: combining text and social network analysis. Proceedings of the 2007 international ACM conference on Supporting group work. Sanibel Island, Florida, USA, ACM.
•
Espinoza, F., P. Persson, et al. (2001). GeoNotes : Social and Navigational Aspects of Location-Based Information Systems. Ubicomp 2001: Ubiquitous Computing: 2-17.
•
Fisher, D., D. Fisher, et al. (2006). You Are Who You Talk To: Detecting Roles in Usenet Newsgroups. System Sciences, 2006. HICSS '06. Proceedings of the 39th Annual Hawaii International Conference on.
•
Golbeck, J. (2008). "Trust and Nuanced Profile Similarity in Online Social Networks." ACM Transactions on the Web.
•
Golbeck, J. (2008). "Trust and Nuanced Profile Similarity in Online Social Networks." ACM Transactions on the Web.
•
Haythornthwaite, C. (2006). "Facilitating collaboration in online learning." Journal of Asynchronous Learning Networks 10(1): 7-24.
•
Jones, Q. (1997). Virtual Communities, Virtual Settlements And Cyber-Archaeology. Journal of Computer Mediated Communication 3(3).
•
Leech, G. (1999). The Distribution and Function of Vocatives in American and British English Conversation. In H. Hasselggård and S. Oksefjell (Eds.) Out of Corpora: Studies in Honour of Stig Johansson. Amsterdam/Atlanta, GA: Rodopi.
•
Leggatt, H. (2007, April 12). Spam Volume to Exceed Legitimate Emails in 2007. BizReport : Email Marketing. Retrieved October 30, 2008, from http://www.bizreport.com/2007/04/spam_volume_to_exceed_legitimate_emails_in_2007.html
[email protected]
48
References (cont.) •
Leung, A. (2003). "Different ties for different needs: Recruitment practices of entrepreneurial firms at different developmental phases." Human Resource Management 42(4).
•
Li, J., J. Tang, et al. (2007). EOS: expertise oriented search using social networks. Proceedings of the 16th international conference on World Wide Web. Banff, Alberta, Canada, ACM Press.
•
Matsuo, Y., H. Tomobe, et al. (2004). Finding Social Network for Trust Calculation. the 16th European Conference on Artificial Intelligence (ECAI2004).
•
Matsuo, Y., H. Tomobe, et al. (2004). Finding Social Network for Trust Calculation. the 16th European Conference on Artificial Intelligence (ECAI2004) 16: 510.
•
McMillan, D.W. & Chavis, D.M. (1986). Sense of community: A definition and theory. Journal of Community Psychology 14(1): 6-23.
•
Pouliquen, B., R. Steinberger, et al. (2007). Multilingual multi-document continuously-updated social networks. Proceedings of the Workshop Multi-source Multilingual Information Extraction and Summarization (MMIES'2007) held at RANLP'2007. Borovets, Bulgaria.
•
Savignon, S.J. and Roithmeier, W. (2004). Computer-Mediated Communication: Texts and Strategies. Computer Assisted Language Instruction Consortium Journal 21(2): 265-290.
•
Swearingen, J. (2008). Four Ways Social Networking Can Build Business. Bnet.com. Retrieved from http://www.bnet.com/2403-13070_23-219914.html
•
Tanev, H. (2007). Unsupervised Learning of Social Networks from a Multiple-Source News Corpus. Workshop Multi-source Multilingual Information Extraction and Summarization (MMIES'2007) held at RANLP'2007. Borovets, Bulgaria.
•
Thompson, S. A. and R. K. Sinha (2008). "Brand Communities and New Product Adoption:The Influence and Limits of Oppositional Loyalty." Journal of Marketing 72(6): 65-80.
[email protected]
49