Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
STATS 252, Stanford University, Spring 2009


Due: Tues Jun 9, 2009, 5pm

Email to:

This assignment is optional but can be used for extra credit towards your homework grade. Students who attempt this homework will also be able to present their findings to the class.

Overview: The goal of this assignment is to create a comprehensive summary of the Survey Insights worth being featured in the press. Please work together on the following questions to make a cohesive entry that is ready to publish on sites as big as the New York Times or even TechCrunch.

Questions #1, 2, 3, 4, 11:
1. How did you first hear about the swine flu outbreak?
2. What was your biggest surprise ever on a social network?
3. What's the coolest thing you have seen on the web in the last month? Why did it delight you?
4. Which blogs do you follow? Why those, and what does it mean for you to follow a blog?
11. There often is not a clear ordering between real life and online. What are examples of data you would share more readily with your online friends than with your real life friends?

Student(s) responsible for this page (maximum 3 students):
  1. Roshan Sumbaly
  2. Shakti Sinha
  3. Sylvie Bryant (


Sylvie Bryant has summarized the major ideas here (although third paragraph mainly taken from Survey Insights question 2 written by Pary Harikrishanan, Sowmyalakshmi):
The internet is the dominant form of communication in regards to the communication of a serious matter such as the swine flu outbreak. However, within this category, traditional and authoritative news sources, such as the New York Times, still dominate. Blogs and social media capture a smaller share of the internet communication share. Television and word of mouth is still significant.

The biggest surprises from a social network, with Facebook being the ultimate facilitator of these three revelations, include how easy it is to contact friends from all periods of life, how willing people are in sharing personal information that they would not immediately reveal face to face, and how viral the information, opinions, and activities we share are, allowing people to discover a different side of you. However, the majority of people maintain that their online friends tend to be the same as their real life friends. People certainly varied in what they would share online, but certain parts of the survey group noted hobbies, daily activities, pictures, and emotions as specific areas that they would likely reveal more readily online than in real life. This is partly due to the ease of sharing to a large number of people online using vehicles such as Facebook and Twitter.

Social networks have a major impact on the way we live and see our lives. They've changed the mindset of the people and the environment we live in. We can see how much people have changed or note surprising facts about friends that come to light, particularly through Facebook newsfeeds. Some say that they get news faster through these networks then the usual channels, while it can also engender annoyance when facts about close friends are revealed here rather then more personally. Some have been put off by being inundated with trivial updates while some have found crucial career opportunities through these networks. It seems that people are still adapting their tolerance levels for receiving so much personal information, but they realize that the real advantage presented by these networks to sustain connections is not to be taken for granted.

The coolest things that people have seen on the internet recently mostly include innovative ways of improving our lives, such as websites that help us manage our money, search for images, search the web, analyze e-mails, and make decisions. Also popular are viral videos such as Susan Boyle’s rise to fame and TED’s “6th Sense” demonstration talk.

There are several super popular blogs that people follow, and a long tail of many other blogs. Blog subject matter can range from broad to very specific and therefore from general to personal subject matter. However, there were also many people who don’t follow blogs due to lack of time. Blogs mentioned two or more times were graphed below. There were
120 more blogs which received a single vote but not captured in this visualization:


First source of breaking news

By Shakti Sinha

Traditionally, the first source of news for people used to be newspapers and television. How has that changed with the growth of social networks and increasing online presence of people? A survey at Stanford provides some answers. The students of the Stats 252 class were asked the question: “How did you first hear about the swine flu outbreak?” We present an analysis of their responses.

Out of the 106 responses, 41 said that news websites on the Internet were where they first got the news from. Television was a distant second, with 21 students citing it as their news source. These results are hardly surprising if we consider the many advantages that online news sites offer over television. Online news is on demand, and the user can decide which news articles to read and when to read. This makes it much more convenient to the user to find only that news which is relevant to him. The second reason online news is more popular can be the much higher accessibility of online news, through devices such as notebook computers and cell phones. In a professional environment, more people are likely to have access to online news than to television, and this trend can only go up further with increasing prevalence of smart phones with browsers.

The survey results clearly show the decline of the newspaper as a source of news, with only 7 students saying that it was the first source of information to them about the swine flu. The biggest reason I would cite for it is the limited options it provides the readers in terms of relevant news. A newspaper has to cater to the interests of all its readers, which is indeed very difficult. Also, the publishing and distribution latency of newspapers impede them from being a first source of news. In an interview with Ashton Kutcher, Larry King said that he missed the feel of the newspaper in his hands while reading it over a cup of coffee. There are many who will agree with him, but the results of the survey indicate that it is not enough to beat the freshness and variety of online news.

An interesting observation from the responses is that relatively fewer respondents (14) had first heard of the swine flu outbreak through another person, which includes mediums such as Twitter and Facebook. This is surprising since Twitter has been observed to be one of the first places where news begins to spread from. The ability of any user to send a tweet with a message from his cell phone enables Twitter to have the most recent information available. The survey results can be a combination of two reasons: the first is that nearly half the class does not use Twitter at all, while over three quarters of the class follow 20 users or less (Insights Q5). The second reason can be that users often hear about events on Twitter, but then follow links to news sites provided by the Tweeters or by searching. Since they spend much more time on the news site compared to Twitter, most users might not realize what led them to it in the first place.

Summary of the responses:

internet news
another person
news (unknown)
can't recall


Coolest thing on the internet

By Roshan Sumbaly

We raised the question 'What's the coolest thing you have seen on the web in the last month?' to a group of around 150 Stanford Business and Engineering students. The responses we got show the inclination of the masses towards visualization applications. The general opinion has been that great UI tends to add the 'coolness' factor to the application. For example, as is evident from the cloud tag below, sites having a good UI leave a mark in the minds of the users. Prominent products like 'iPhone' and its related applications seem to be interesting for the audience.

We are also going through a phase of 'Social Data Revolution'. This is evident from the massive growth of data in our lives, especially with the emergence of real time data. This has resulted in the growth of various real time tools, which is another favorite amongst the students. Twitter is the favorite in this case especially with the sudden growth of startups around it. Examples of cool applications based on twitter inclde

Finally, some students also answered this question with regard to events which occured in the past one month. The most prominent amongst them were the rise of fame for Susan Boyle. The power of data exchange is evident from the 10 million views on her youtube video, making her one of the rare 'internet icons.' Other examples of real life events include the fight between CNN and Ashton Kutcher.


Blogs followed by users

Summarized by Shakti Sinha, based mostly on the analysis by Andrei,Victor Alexandru and Tronson, Andrew.

To understand other interesting aspects of online media, we analyzed the responses to the question “Which blogs do you follow?” in the survey. Predictably, most of the blogs we got in the results were technology blogs. This is expected when the respondents are Stanford students, as in this demography, passion for technology is common. TechCrunch left other blogs way behind with 24 followers, the second being Engadget with 7. We can also observe a clear long tail effect in blogs, with very few having many followers, and many having few followers. This effect is explained in the next paragraph.

Following blogs is similar to shopping for music. You follow the blog in accordance to your interest. Few of us like blogs that talk about surfing, playing poker or baking bread (areas of smaller common interest), and at the same time more of us will be interested in technology or sports (areas of larger common interest). As with music, people who decide to follow blogs make two types of “purchases”: social and individualistic. The social choice is the choice made in the area of common interest (in our case most people who wrote the survey followed blogs about technology – techcrunch, gigaom, etc.), while the individualistic choice is the choice made in the area of a particular interest (some people followed blogs about surfing, friends and family, or Stanford FML). In this view I think that the blog has revolutionized the architecture of the web. Until the blog, only popular websites made it on the Internet. Blogs have created the long tail in the website industry the same way the Internet did for music or books.


Examples of data shared more readily with online friends than with real life friends.

By Roshan Sumbaly

The emergence of the internet has seen many people getting embroiled into a virtual world. A world which is very distant from the real world and helps you get away from troubles of real life. Various recent surveys have shown that the average time spent online by a user has increased to around 7 hours daily. This is primarily because of the emergence of various technologies like the mobile, PDAs, etc., which enable you to be online and connect any time / any place. This is resulted in various people living two lives viz. an online life and the real life. The rise of various social networking sites (like Facebook, Orkut, Hi5, etc), along with the ease of communication using IMs, has resulted in us having online friends. These may be friends whom we don't meet very often or are basically people whom we would have found online through Chat rooms, etc.

In a survey to a group of 150 Engineering and Business Stanford students, we raised the question 'what type of data would you be comfortable sharing with your online friends?' The results had around 15% of the users who said that they would be reluctant to share any sort of personal data and generally prefer not to make online friends. Most of these students generally prefer adding real life friends as online friends. But a staggering 85% have in the past added users randomly whom they would have found interesting while surfing profiles on social networks. 8.5% of these users said that they would share just about anything private including their emotions / embarrasing moments, etc. But the rest of the users responded by maintaining their privacy and said they would prefer sharing only their statuses and common media data found on public sites.