Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
STATS 252, Stanford University, Spring 2009
Class time: Monday 2:15 - 5:05 pm
Class location: Gates B01

Class 3: Data

April 20, 2009
Audio recordings and transcripts:
Part 1: mp3 transcript
Part 2: mp3 transcript

Also available is the video recording of this lecture:
SCPD video recording

Table of Contents

Economics of Data

In preparation for Reid Hoffman's guest lecture on May 11, 2009, one question to think about is an inherent problem with LinkedIn. Those searching for jobs and those eager to make contact with professionals don't have much to offer. The full-time, high paid professionals can be thought of as living in a bubble, away from the people trying to reach them. How can this asymmetry be addressed, and in what ways can stronger connections be made? What kind of currency can people looking for jobs and contacts use and how can they obtain it?

The following blog discusses experiences, both good and bad, on LinkedIn.

Warm Up Exercise

The Question

If you had devices which recorded everything possible about you (e.g. movements, concentration levels, and environmental stimuli), how would your behavior change, if at all?

Special Cases

In answering the warm up question, we can consider four different special cases:
  1. Only you are able to view the data, which consist of life signals and health statistics
  2. The data become public after your death or perhaps only to those specified in your will
  3. The data are password protected, so you could give friends and family members access
  4. Law enforcement agencies have access

The answer to this question depends largely on who is able to access and view this large amount of data, i.e. what permissions are present? Also, is the data set indexed so that it can be queried and searched easily?


  • This availability of personal data may result in a loss of spontaneity and cause one to act more deliberately.
  • Others may be self-conscious of their actions for just the first few days after the devices have started recording and then get used to it.
  • Daily behavior could change for the better. The data could be used for purposes of introspection and self-improvement. For example, a person can see how much time she spends doing undesirable activities and then try to limit the time spent on those activities. Examples include taking long showers and using Facebook for long periods of time.
  • Norms of acceptability change over time. Professor Weigend mentioned an example about German passports being machine-read about 20 years ago and the furor it caused. People were concerned it would greatly empower the police, but nowadays German passports have chips embedded in them, allowing personal information to be scanned, potentially by strangers. Here is an article by EE Times Europe: . Also, check out the section on RFID chips on the Wikipedia entry for German passports: RFID chip with biometric certificate .
  • The ability to search all of the data by the public may lead to more meaningful queries.

Existing Devices and Platforms

Absent from the responses given in class were business applications. The following web sites and devices provide interesting services that touch on the ideas of self-surveillance and broadcasting your every movement to the world.


Fitbit is a small device that keeps track of fitness and sleep statistics including steps taken, calories burnt, and distance traveled per day. As stated on the web site, "Fitbit tracks the movements that your body makes and can tell you how long it took you to fall asleep, how many times you woke up throughout the night, and the actual time you were asleep vs. the time you were in bed." Information from the Fitbit is uploaded to their website, where you can view more detailed data and log fitness goals with your family and friends.


The Fitbit Dashboard contains food and activity logs, a weight tracker, a breakdown of activity during the day into four categories (sedentary, lightly active, fairly active, and very active), visual displays of sleep patterns and calorie burn off, and fitness updates from friends.

View a screen shot of the Fitbit Dashboard



"With Twitter, you can stay hyper-connected to your friends and always know what they're doing" (Twitter How page ). Twitter allows one to follow "tweets" of other people, short status updates sent to mobile devices and blogs of followers. For those of you unfamiliar with Twitter, here is a great video called Twitter in Plain English:


Go to News Article

iStanford, an iPhone application designed by Stanford students Kayvon Beykpour and Aaron Wasserman, allows users to access student registration services and to view Stanford sports scores among other things. One feature brings up a campus map indicating the current locations of students who also have iPhones and are allowing the service to track them. Many of the comments on the article regarding this tracking feature are apropos of the exercise topic. BertW posted, "Interesting that 3 of the first 6 reactions here see this as a promising innovation and 3 see it as an invitation to crime. (I smell dissertation topic!!) My personal response is, way too creepy. Whether the downside is burglary, stalkers, pathological over-parenting or government surveillance, I just cannot imagine an upside worth the price." Below are some screen shots taken from Terriblyclever Design (, the two students' software company that is marketing the product to universities as MobileEdu.




23andMe is a personal genomics company that sells personal DNA analysis kits. After spitting into a tube and sending it to their lab, you can discover a whole lot about your health and the traits of your ancestors. 23andMe will analyze your saliva, genotype your DNA, and upload the information online. You can also see what you have in common with other consumers and contribute to genetic research. As a sign of the growing interest in personalized medicine and indexing the human genome, Google recently invested $3.9 million in 23andMe. As Kevin Kelleher of technology journal GigaOM writes, "If Google wants to really organize the world's information, it needs to consider DNA, the most personal of data. And what 23andMe is purporting to sell is the ultimate in navel gazing."

How it works:

The Question of Anonymity

Since Professor Weigend asked us a question on how we would behave when we are being tracked for every single thing that we do, we wanted to discuss the hypothetical question of "What would happen at the opposite end of the spectrum if everything we did were kept anonymous?" How would you act if NO ONE knew what you did? You can steal, rape, murder, and no one would know that you committed those crimes.

To go more in-depth with this question, we would like to present a website with complete anonymity called 4chan, , as a special case study. 4chan is an imageboard, a forum with the ability to post images, and all of the content is user-generated. 4chan is an EXTREMELY popular website with an Alexa traffic rank of 514.


What is special about 4chan? 4chan has a section on its website called "random," aka /b/. /b/ forces the users to remain anonymous. In fact, there is no registration system. Wikipedia's description: "The site's '/b/' board is by far its most popular and notorious. It is known as the 'random' board in which there are minimal rules on posted content."

Without a registration system, every forum post is anonymous. After a while, the users of the website have decided to refer to themselves as "Anonymous." Many people on this website use proxies and other tools to hide their identities even further. Collectively, as Anonymous, the users mainly create posts about:

  • Racism (racist remarks against all races, praise for Hitler)
  • Pornography (illicit pictures, bestiality)
  • Misogyny (women treated as objects, statements encouraging violence against women)
  • Gore (pictures of the dead, dying, and seriously injured)
  • Animal Abuse (images of the killing and torturing of animals)
  • Drugs (pictures of drugs and people abusing them)
  • Stalking (posting revealing pictures of attractive women on MySpace or Facebook, hacking other people's profiles)
  • Protests (statements made wearing Guy Fawkes' masks , hacking, DDOSing, and "raiding" other websites)

WARNING: Even though the website is very popular and many people have already visited the site, we highly advise people against visiting the /b/ section as you will find very graphic images that may scar you. There is even a disclaimer for /b/:


The reason we asked "How would you act if NO ONE knew what you did?" is that we believe that this question is similar to "How would you act if EVERYONE knew what you did?" when you look at the core of the questions, different levels of privacy. The answer, as presented in our case study, is yes. We believe that the level of privacy definitely changes the behavior of a person.


At the opposite end of the spectrum lies the case where one is completely non-anonymous and is closely connected to a circle of friends, peers and colleagues, amidst which one is conscious of maintaining a good image of oneself. Social networking sites are an apt case study for this. LinkedIn is an apt study for this case.
Of the several things possible, one can do the following using the LinkedIn network:

  • Manage the information that's publicly available about you as a professional
  • Find and be introduced to potential clients, service providers and subject experts who come recommend
  • Create and collaborate on projects, gather data, share files and solve problems
  • Be found for business opportunities and find potential partners

LinkedIn's tagline goes 'Relationships Matter'. In such a closely connected professional network, a case orthogonal to the case presented above, people are extremely conscious of the image they project to the exterior world and the information they make accessible to others. In the ongoing debate on the value of LinkedIn ( , Dharmesh Shah quips "...If you end up making a connection to someone with a very negative EV (like a career criminal), then at some level, you are responsible for others that might connect to that individual ...". Naturally, when the stakes are high, a person A would not want to let the world know that he knows a certain person B with a bad image, even if he does so in real life.

The above extreme case studies indicate that people's behavior changes drastically depending on the anonymity levels and their connectedness (who has access to the information)

Data Mining Problems in E-Business

Put yourself in the shoes of Amazon's Chief Scientist or Jeff Bezo managing the e-business side of Amazon.


If you have all the data in the world about your customers, every click and purchase and page view, what are the main challenges in using this data?

1. Recommendations
  • Frame: What is scarce? Our attention. Companies need to provide information that is useful to us to get our attention, which will hopefully lead to additional revenue for them.
  • Recommendation systems make between 25-50% of the revenue of E-business companies.

2. Actions of relevance to the user, i.e. personalized marketing or
  • cross-selling: recommendations based on your past buying habits
  • personalized marketing or using a potential customer's personal characteristics (even location!) to recommend relevant services and deals

3. Prediction is good, control is better
  • Prediction - this is what is going to happen
  • Control is where the money is, if can take action based on what you think will happen
  • most expensive double digit flows every computed, in dollars per bit: Coefficients for the airbus controller, over 100 parameters, hundreds of millions per dollars spent, if get coefficients wrong, could lose a lot of money
  • airbus.png
  • if you can just predict something is happening, but don't know what action to take, not as good as if you can influence what the person is doing
  • ex: interesting to know, with error bound, so and so millions of dollars per year, more interesting to know HOW to increase revenue
  • asks, what can we do to make the customer spend more (A/B testing), versus reading customer spending reports

Aside: People have no incentive to ever reveal their willingness to pay, but can uncover via A/B testing in the pricing of products
  • For product X, Compare revenue at Price A * Unit Sales for Price A versus Price B * Unit Sales for Price B, whichever price yields higher profit will be the one you choose as the price for product X
  • Note: this pricing strategy does not involve looking at the price of competitors! in economic terms: essentially operating as a monopolist and selling at the profit maximizing price

What about predicting how trends outside your control will influence behavior? Run a PHAME with an A/B test

4. Problem: Recommendations that are too narrow or out of context
  • Risk: lose credibility with the customer because ad doesn't seem relevant (and thus not useful) to them
  • Question: has anyone in the class NEVER purchased on the recommendation of Amazon? yes, it is usually the people who do not shop for the same items or category of items repeatedly, their recommendations always appear out of context then (note: these people are in the minority of customers)

Aside: Types of recommendations on Amazon:
  • "People who bought this bought that": This aids people by providing a for of data drive decision support system
  • If recommendation related to past, rather than the current decision process, they will be less useful

5. Ask the customers!
  • Don't just "sniff the digital exhaust", ask people what they want explicitly
  • This strategy "Customer managed relationships" as opposed to the traditional "Customer relationship management" is more in line with the social data revolution
  • provide customers with a pipeline to provide feedback, generate bidirectional communication, establish a community for customers

6. Cross-Platform Selling
  • If I like books about cooking, can Amazon recommend kitchen appliances to me successfully?

Problems with the Cold Start Recommendation Systems
Co-purchasing market matrix:
  • If a person is considering buying item i, recommendation system gives weight to recommending an item j if it has been co-purchased within 24 hours with item i by any person in the past
  • Problems with this system: items popular for everyone will be recommended to everyone
    • ex: buy Harry Potter with a 64-gig SD card, this system would recommend a Harry Potter book to the next person looking at a 64-gig SD card, not a very helpful recommendation
  • Instead, normalize the matrix and find the relative probability that item j is related to item i
    • Term frequency inverse document frequency
    • normalize by what is in the corpus, if one item appears often in the document but not often the corpus, more relevant than if the term appears frequently in both the document and the corpus, , how does this document or item differ from the corpus?
    • Amazon's version of this is "statistically improbably phrases" SIP's
  • Positive feedback loop, once you recommend something, people more likely to click on it
    • You can cretae tunnels from one area to another, but quite hard to do in practice

Example of elegant recommendation system:
  • Item by Item
  • Not people who bought x also bought y, but people who looked at x bought y eventually (within 24hrs usually)
  • This is much more useful in supporting people along the decision-making path to buy a certain product
  • Also much better since people rarely buy items together within 24 hours
  • Not hard to implement, can find in the "Collective Intelligence" book on the course reading list

If you are inspired to design a recommendation system: Netflix's $1Million dollar reward for a Good Recommendation System, not quite the same principles at work as in Amazon, however


One of the greatest innovations in Amazon was to discover the link between the Clicking data and the actual purchase data. Prior to this, the recommendation systems followed one of the following approaches:

  1. Find the Clicking Patterns and try to predict the customer interest and purchasing patterns based on that.
  2. Find association rules between the items actually bought. If many people bought items X and Y together, then the relationship between X and Y was established. This is the Item-based recommendation.
  3. A variant of the above approach is to recommend items based on what other people have bought. If a certain number of other people having similar profile to the user’s, who had bought X had also bought Y, then, the relationship between X and Y was established.

However the idea to merge these ideas to establish a relationship between the Clickstream data and the actual purchase data was an innovative one that was remarkable in its results.

Doubts Raised

Is there a Path-based correlation too, in the recommendation systems?
Ans :Recommendation Systems usually use just 2 point correlations. That is, if a visitor views X -> A -> B -> C -> Y, and finally purchases Y, the correlation is established only between X and Y. This is done in order to keep the recommendation system tractable as usual recommendation systems neeed to deal with objects of the order of millions.

Google Tech Talk On Recommender Systems.

Amazon vs Netflix


Both Amazon and Netflix place a great deal of importance on the Recommendation Systems. However, there is a fundamental difference between the recommendation systems offered by them.
NetFlix :
Relies chiefly on the Explicit ranks/scores that the users assign to the different movies. The scores assigned by the users is used in conjunction with the user profile to recommend appropriate movies to the users.
Amazon :
Amazon relies more on the products and the purchases rather than on the explicit feedback given by the customers. The data collection here is done more implicitly.

Other Companies Involved in Recommendation Systems

Customer Lifetime Value

“In marketing,customer lifetime value(CLV), lifetime customer value (LCV), or lifetime value (LTV) and a new concept of "customer life cycle management" is the present value of the future cash flows attributed to the customer relationship.” as stated by the Wikipedia

The Customer Lifetime Value represents each customer’s value in monetary terms, and hence is extremely important in the decision making processes. Especially, since the resources are scarce, finding the value of a customer is extremely important.

Importance of CLV:
  1. Manage and Allocate Scarce Resources:
    1. Resources of a company are scarce and measurable and hence a quantitative metric as to how much each customer is valued is needed to allocate resources amongst customers.
    2. For instance, one of the scenarios suggested by the audience involved the aftermath of a cancelled flight. The way the passengers for the smaller alternate flight are usually chosen their CLV.
  2. Increase the Value of the Customer for the company:
    1. By measuring the value of the customers to the company in quantitative, monetary terms, the company can focus its attention better on the customers who have a high CLV and can redress the needs of those with low CLV. This way the values of the customers to the company can be enhanced considerably.
  3. Lead Generation:
    1. The wikipedia refers to this as a marketing term that refers to the creation or generation of prospective consumer interest or inquiry into a business' products or services.
    2. If the CLV is known, proper focus can be maintained during the marketing to enhance and maximize the lead generation.

Papers that describe CLV and its Impact

Oft-Neglected Costs and Values

  1. Company reputation / word of mouth:
    1. Poor treatment by a company can prove too costly in this age of democratic media where everyone could be a content publisher.
    2. Example : “What would Google Do?” speaks about many such cases where word-of-mouth publicity can be a invaluable advantage or a great misfortune.

  1. Cost Of Frustration while working with Computers:
This is best illustrated by the video.

  • Morale of the Company:
    1. The morale of the employees is an extremely critical factor in the success of any organization but the cost of not doing so is often overlooked.

  • Hidden Emotional Costs:
    1. There are certain decisions that carry with them a certain emotional cost, that can be very very hard to capture or measure. For instance, missing a flight can lead to irrecoverable or even emotionally overwhelming situations for some people who are in great need to visit their ailing dear ones.
  • Cost of Interrupt:
    1. Mails, Chat clients and Smart devices including the iPhone ensure that we are always wired and connected. However, they also pave the way for a means of continuous interrupt.
    2. Apart from these, the numerous notifications also frustrate and distract the users from their intended task.

  1. Cost of Information Overload

  • Long Term Costs vs Short Term Benefits :
    1. There is sometimes a tradeoff between the long term costs for the company as opposed to the short term benefits. Though the interests of companies like Google are aligned with that of the user, in most other cases, it is not the case. In these cases, the company needs to make the decision between the two, without underestimating the importance of either.

  • Cost of Misinformation and Poor Service:
    1. One cost that is often grossly underestimated is the cost of poor service and misinformation. Though it could be seemingly trivial such as a typo in the contact number on the company website, it could leave a lasting impression on the company’s image in the consumer’s mind. As regards the quality of service, in this democratic information age, any reduction in the quality of service could cost dearly to the company.

In order to enable the consumers to focus on the relevant things, there are some sites that leverage on the experience of the masses. Goodguide is one of them.

Goodguide : Enables the consumers to write reviews, rate products and enable their peers to find healthy and safe products.

How is CLV Calculated?

Traditional Method

  • Based on monitoring a customer's transaction history.
  • Utilizes data the customer involuntarily provides to the company.
  • Harvard Business School has an excel toolkit to calculate the Lifetime Customer Value using the traditional method.

Future Method

  • Based on social data.
  • Important to understand the large influence other people play in our buying decisions.
    • Example: Blogger who gives product or company a bad review can have a significant effect.
  • Social Data – voluntarily sharing our experience in public product reviews.

Notable Trends

  • Involuntary becomes voluntary.
  • Private becomes public.

Customer Perspective

In buying decisions, studies have shown that friend's recommendations are 5-10 times more powerful than company recommendations.

Who do we trust

  1. Friends – people we know personally.
  2. Peers – people with similar social status, similar location, similar lifestyle.
  3. Experts – professionals
Trade-off: Experts know best, but we trust our friends and peers more.

Case study: is a computer hardware enthusiast community. People who are interested in computer hardware from all over the world come to the forum to talk about hardware. And of course, many of the people on this forum began to befriend each other as they spent their time on this forum, talking about their love for computer hardware.


Many of the people from this forum do reviews of computer hardware and also ask for advice what computer parts to buy.

Here's an example of threads that ask for advice:

Last year, when Intel came out with a new CPU, namely Q6600, which is a one of the Intel's Core 2 Quad microprocessors, hardforum enthusiasts began creating a buzz around it. It spread like wildfire. People began to ask for advice on whether or not to buy the new Q6600.

Here are some threads that talk about the Q6600 CPU:

Many of the people adviced to get the Q6600.
Here are some of people's replies when asked about the quadcore CPU:
"Go Quad, good prep for the future. Imagine the poor souls who was in your situation a few years back and decided to stay single core instead of dual"
"I agree with them... go Quad. There are ppl hitting 4ghz with the G0 stepping with water. Just look at the OC database."
"to the op: Get a q6600. you wont notice a real world difference between 3ghz and 2.4ghz in most situations"
"I have a e6600 at 3.33Ghz now and there aren't much things that I can't conquer with this chip"

The Q6600 became a HUGE hit, and became a 5x winner of customer choice at, a popular website for online computer hardware shopping. As you can see from the screenshot below, it has over 3078 ratings, and its overall rating is a perfect score of 5 out of 5.

Example reviews:

A closing thought on this case study is that a buzz among a tightly-knit group of peers can create a momentum in terms of marketing.

Leveraging Social Data - The Future

  • Ask a question and Aardvark leverages your network of friends to find someone you know and trust who is also an expert on the topic pertaining to your question.
  • When planning a vacation you provide basic information (male/female, traveling alone or with family, etc.) and the site will show you hotel reviews from people like you.

  • Leverages your DNA collected by the 23andMe program.
  • Answer basic questions on 23andWe to contributive to the study of genetics.
  • Aims to relate your answers to your DNA.
  • Has the potential to produce revolutionary scientific finding by using social data.

Leveraging Social Data - Casualties

Travel Agents

  • In recent years there has been a distinct decline due to the value that online sites provide by having real customer reviews.
  • The collective knowledge and experience is far greater than any single travel agent could possibly provide.
  • You don't really care if a travel agent (expert) says you will like a certain hotel, you care if people like you like that hotel.
  • No travel agent could possibly check out all the hotels in the world.

Customer Computed CLV

  • Customer now computer their own CLV (Company Lifetime Value).
  • Examples: Which credit card to get? Which airline mileage plan to sign up for?
  • How do customers decide?
    • What are the long term benefits (value)?
    • Do we think the company will still be around in 10+ years?

How CLV is affected by Social Data

Two General Approaches

  1. Across Time
    • Traditional Method - look at how much money the customer gave us last year, compute how much they will give us in the future.
    • This method ignores relationships and ignores friend/peer influence.
    • Often to compensate the company tries to create a fake relationship with the customer.
    • This method completely neglects Social Data.
  2. Across Networks
    • Correlate CLV across social networks.
    • Example: If your friends all like a product, chances are you would too.
    • Social recommendations are very important.

Why does traditional CLV calculation ignore social aspects?

  • Companies didn't have access to the data.
  • Now people voluntarily and knowingly share data about themselves and about their relationships.

The heart of any company-customer relationship is data

  • Customers give you data either voluntarily or involuntarily and hopefully you can do something useful with it.
  • You don't want to have to keep asking the customer for same data.

Tough data management question at Amazon

  • Should we allow users to permanently erase past purchases?
  • Customer may find this desirable for privacy reasons.
  • But then what happens if the customer wants to return the item and you have no record they ever purchased it?

CLV Summary

  • Transition from customer transaction economics to customer relationship economics.
  • The relatively new idea of customers voluntarily sharing data will lead to new types of customer-company relationships.
  • Companies need the ability to understand and use the data that customers are constantly sharing in their own networks.
  • End goal: use social data to derive some action that can increase value to the customer, and thus increase the customer's CLV.

Websites: Tracking Social Capital

SocialFly - “Gives you the tools to organize your social life and keep up with more people.”

Buxfer - “Track shared expenses, split bills, and debts.”

BillMonk - “Easily split bills with roommates and friends.”

Twitpay - “Update your status to pay somebody.”

HW 1 Project Presentations

Group 12 - Thats What She said!

Members: Pierre Djian, Aravind Narayanan, Tina Marie Cardaci, Jonathan David Beekman, Emile Chamoun, Sampath Deepal Jinadasa
Contact Info:
Page Thats What She Said Photos
Mr. and Miss Facebook

The team initially toyed with the idea of creating a Mr. and Miss Facebook community where people can upload pictures and everyone in the community votes for the best ones and eventually declares a Mr. Facebook and Miss Facebook! However, the team was interested in creating data that could have more value(than just having users rank a set of pictures). In addition, this idea was more suited to a Facebook group(than a page) which led to the creation of the “That’s what she said!” brand.

“That’s what she said!”

Description: Create an organic community to share(pictures/videos/text) “That’s what she said!” jokes.Users upload pictures which have funny "Thats what she said" captions and others vote these pictures up or down.

Goal: There are other very popular “That’s what she said” pages on Facebook. What is unique about this group’s approach is that they are trying to see whether Social Data Revolution can be contained by strictly allowing users to post only a certain type of content (pictures or text or videos) and what effects it has on user interactions and building of organic communities. They have initially created a “That’s what she said Photos” page that allows users to post only pictures (No Text/Videos). They plan to subsequently introduce “That’s what she said” pages that allow only Text and only Videos. This would be followed by a page on the same theme allowing all three types of content. Does containment of content on Facebook pages work? What content is the most popular? What content results in high rate of interactions? What content receives a poor response from users? What metrics do you use to measure such activity? These are some of the questions that the group plans to find answers to at the end of the exercise.

Metrics: Each of these pages would be evaluated on the currently available Facebook metrics followed by an analysis of the effects of containment of content in the context of the Social Data Revolution.

Marketing Strategy: Marketing strategy consists of inviting friends to join and posting messages to other similar groups which focus on jokes.

Questions from the Audience:

Which of these pages will be more popular? What is your hypothesis?
The team hypothesizes that Text will be the most popular as it takes less time to write a "Thats What she said" post as compared to finding a picture/video for a line that can make you scream “That’s what she said!”. However, since jokes in the form of pictures can be more entertaining and therefore should be more popular than text, the hypothesis needs to be validated and is one of the top goals for the team in this exercise.

Group 11 - Data da Vinci

Members: Blake Cutler, Matt Jones, [Michael Weiksner?]
Contact Info : mkjones aat cs dot stanford dot edu

Data da Vinci

Website: Data Da Vinci

Description: Data da Vinci is a community where users can share data sets, then transform and analyze those data sets in interesting ways, and post the analysis. It allows users to browse the discussions for links to data sets, post links to datasets of their own, upload photos or videos of visualizations about each one.

Goal: Create an organic community that lets fans discuss and analyze some the most ingenious and interesting patterns in data visualization. The idea is similar to having users in Flickr groups create and share interesting visualizations of pictures.

Metrics: While it was possible to gather some interesting metrics, this page discusses content that may not attract a very large number of users. Hence it may not the best choice for this class. The group therefore decided to create a lighter weight page(see below) that can attract more users.

Navy SEAL Pirate Hunters:

Website : Navy SEAL Pirate Hunters


Description:This brand draws inspiration from the resurgence of piracy in Somalia.

Metrics: The key metrics the group would like to use are ones that help them track the channels through which users visit their page/become fans. Was it Facebook Search? Was it feeds from friends? Was it Search Engines? This metric is important as it can prove to be an important factor in determining the focus of marketing campaigns.

Marketing Strategy: The group has not yet launched any promotion campaigns and has organically acquired a couple of fans. They would like to use their metrics to zero in on the focus of their marketing campaigns.

Mozilla Firefox(Proposed Idea)

Website: Mozilla Firefox Community on Facebook

Description:The group has access to the Mozilla Firefox page which already has an established base of about 3,50,000 fans.

Marketing Strategy:The team proposed running contests for Friends and family (downloads of the Firefox browser, browser skins, other products) and see how fans react to these promotions and iteratively figure out the focus of the campaign.

Metrics:They would also like to add active content on the tabs/page that could enable the page creators to display free-from content(not restricted to fields that Facebook provides) and also let fans add such free-form content that could enable the page to collect metrics for more fine-grain tracking.

Group 10 - This OR That?

Members: Burak Erdem, Tom Mapham, Pablo Paniagua, Shakti Sinha, Roshan Sumbaly
Contact Info:

Go To Page

Description: The group’s aim was create a page that was self sustainable and could survive on its own even after the class got over. They wanted to create an organic community that can leverage the power of the Social Data Revolution by asking comparison questions of the form "This OR That." For instance, ask people questions such as Tea or Coffee?, Linux or Windows?, Girlfriend or No Girlfriend?! While a lot of pages promoting celebrities and brands are already popular on Facebook, they hypothesize generating value by comparing some of the most popular brands.

Marketing: The group originally planned to use only pictures as posts but later switched to allowing text and videos as they are still in the process of figuring out the optimal model for content creation. The page has acquired a good number of fans and the team has been monitoring the number of active fans everyday and based on its metrics, is making good progress. They are also planning to increase interaction by creating discussions on the Facebook page and use Twitter, Search Engine Advertising and other viral means of marketing to promote it.

Metrics: The current Facebook metrics are quite useful in terms of giving a sense of user engagement and also provide a significant information about the demographics of users. However, current metrics do not provide a clear measure of user engagement. For instance, total fans or page views measure the traffic of the page but high traffic may not always mean high customer engagement or successful page. The team has therefore defined other metrics to guage the success of their page such as measuring the time spent on the page and the number of active contributors(Readers/Writers). Facebook metrics also do not give any information about the sources (e.g. search engines, organic, users not on Facebook) the web page acquires traffic from. The team has defined metrics that can enable such measurements and can be used for brand promotion.

Group 9 - Sleep Sheep

Members: Georgia Andrews, Constance Duong, Kevin Jue, Hoon Min Kim, Jonas Jacobson
Go to page


Description: Team "Sleep Sheep" have created a page on 'Sleep and Sleeping' that can promote healthier sleeping habits through interaction on their Facebook page.The page hosts an application where people can log how many hours they slept last night, post comments/questions relating to sleep and generate data that can be used to learn some very interesting co-relations between sleep and our daily activities.

Goal: One of the primary goals of this brand is to promote awareness of sleep in a fun, interesting way. For instance, the application will tell you the average number of hours people in the community sleep and let you do fine-grain filtering based on your networks,city,organization and possibly other factors known through a person's Facebook profile. It intends to create awareness about sleep and fans would be informed if they weren't sleeping enough(Or were sleeping too much! Wake up people!). Fans will also benefit from the discussions in the community. For instance, how many hours are the Stats252 students sleeping? Do people at start-ups sleep?! What do you do to get more sleep? Ask your friends, peers and get some interesting and useful feedback from the community!

Marketing: Since the application was not ready in the first week the group did not advertise much as not having an active page could lower the number of returning users. The group interestingly also acquired a couple of random users who could have found the group through Facebook search. Since the application is now ready, the team is actively working on sending invitations to friends.

Suggestions from the audience:

Discuss Sleep and Dreams?
The team believes that while it is a possible direction, they would like to promote talk about sleeping hours and times as it is shorter and takes very little time for people to write about. Happyfactor
This website regularly sends you a simple text message that asks,"How happy are you right now?" and lets you respond with a rating(1 to 10) via SMS. By comparing how happy(or not) people felt throughout the day in relation to how much they slept the previous night one could learn interesting connections between sleep and one's happiness quotient through the day.

Yawnlog Yawnlog
Lets you track your sleep, the times you slept, times you woke up, takes notes about your slumber and lets you compare with friends,peers or do other interesting things with that information.

Sleep and the Quality of your day
Prof. Weigend suggested encouraging people in the community to log their day and observe co-relations between one's sleeping patterns and the quality of one's day(Also possible through although only allows text interaction and is not limited to logging about sleep).

Another suggestion from Prof. Weigend was to encourage discussions on jet-lag. How do people deal with jet-lag? What makes it better? What makes it worse? The power of social data can be harnessed to provide interesting insights from real people and definitely aid in related research.

Your i-phone tracks your sleep!
One could also create applications on the mobile platform that could catch you if you were lying about your sleep on any of these communities!

Group 8 - Campus Garden Initiative (CGI)

Members: Todd Sullivan, Tayler Cox, Brad Griffith, Tristan Walker
Contact Info:
t s u l l i v n @
Go to page


CGI is a new student group dedicated to increasing sustainable, local, organic, agriculture on school campuses across the nation. They are a support network for student and community gardeners to exchange ideas and connect.



The goal is to expand student educational opportunities and awareness of sustainable food production techniques as well as of whole food nutrition.


The Campus Garden Initiative is working to establish the resources and infrastructure to lower barriers for students to get involved in sustainable food production and education. Through direct gardening experience, social and educational activities based on the food produced in the garden, and supplements to the foods served in dining halls, a network of gardens will move us toward more sustainable campuses.

Marketing strategies:

The Facebook platform is a way for CGI to attract a wide audience of people. By combining multiple platforms: Facebook, blogs, webpage, CGI hopes to diversify its audience.

CGI wants to get people to share gardening data as a cause related to sustainability. Tayler mentions two approaches to recruiting people: Facebook(online advertising), gardening events(fliers). The first venue, Facebook, has proven to be not that successful so far and CGI hopes to attract members during their gardening event this weekend (25-26 April).


Suggestions from the audience:

1) Go to Twitter, filter all the people in California that talk about gardening and follow those people. Same person suggested setting up an account on Mr. Tweet. This might be a good opportunity to use Yahoo Pipes ( Pipes is a sophisticated and powerful tool that allows you to aggregate, manipulate, and mashup content from around the web.

Group 7 - T-eam T-agged

Members: Gary Ho, Wu Wen, Cheewei Ng, Sumithra Jonnalagadda, Ron Chung
Contact Info
Go to page


T-Team T-aggged is a group that is very enthusiastic about T-shirts. They believe that T-shirts are “the ultimate way of expressing oneself”, and that anyone can find a few matches for himself or herself.


The page is a platform that aims to help people analyze t-shirt trends so that one is better informed before shopping. They want to “highlight the best” among the thousands of choices stores all across the web.


Users can upload pictures of t-shirts that they see on the web. They usually also write a brief description and they tag people based on who they think the t-shirt might be expressive for.

Marketing strategies:

Facebook is a multi-purpose platform. The group intends to build a Facebook Application that allows interaction between users: the designers could sell the t-shirts, to the people that are interested in them. Alternatively, people who recommend t-shirts would also provide a direct link to the place where the shirt can be bought, so a purchase can be made right away. A secondary goal is to tailor the application to match people’s profiles so that more targeted t-shirts are presented to them.


Group 6 - Numerati

Members: Carlin Eng, Erika Crawford, Eric Ma, Katrina Hui, Daniel Aisen, Chetan Sharma
Contact Info: Facebook page.
Go to page


Numerati is an extension of the senior class website designed for the graduating class of 2009. It is different form the website in the way that it allows for interaction between users. They can get to know each other better, exchange ideas and discuss on topics relevant to them.


Keep students informed about events that are relevant to seniors such as “the deadline to apply to graduate”, or “where to buy formal tickets”. The Facebook space is intended to be a mash-up of relevant data for people graduating in 2009.
A secondary goal is to create a space for people to return to after graduating. It is intended to serve the purpose of an alumni association.

Marketing strategies:

Most students are already on Facebook, so it would be a good opportunity for them to connect with one another while getting information relevant to them.
Metrics include the amount of interactions between users, and the extent to which users believe the content is useful. Facebook allows for fast communication protocols towards large audiences: address everyone at once, all fans are updated when something new is posted, etc.


Suggestions from the audience:

Discussion board about what people’s plans are after graduating, which is a very popular topic among students that are about to graduate.

Group 5 - The Winners of the social data revolution

Members: Chris Anderson, Andrew Tronson, Sowmya Pary, Bobby Murphy, Aditya Singh, Maya Choksi
Contact Info:
Go to page


The group is currently working on building a page and Facebook application that is interested in finding out “what you think about yourself”. The results are then shared among the community of fans.


Take your input of what you think of yourself and put you in the context of other people.

Marketing strategies:

The access to a large community at once and the ability to build applications on the platform is why this page was created on Facebook. The group intends to use the data that they get from users and out of interesting/relevant statistics. You could eventually be given the opportunity to target your audience as a user, when posting your "would you rather" question.
User interactions consist in the form of posting would you rather questions. Based on the interactions, the Facebook application will be able to produce output data that might be useful to the user (i.e statistics about gender, age, etc.). Two reasons for users to return to the page: useful data to them (statistics) and the humoristic characteristic of the questions on the page.

Suggestions from the audience:

Do you have any metrics outside the ones that Facebook provides?
How long you are interacting with others (i.e. how many questions you ask or how many responses you provide incoming questions).

Group 4 - Ninkasi

Members: Eric Legrand, Catrina Benson, Mike Polcari, Yun Liu, Jay Ramamurthi, Sylvie Bryant, Jennifer J Huang
Contact Info:

Go to Page


To enable more active interaction with the community members of The Multiple Sclerosis Society, Northern California Chapter to improve their performance with regards to fund-raising etc. Interestingly, more research money is collected in the Northern California area than any other particular geographic region - however, people are not as engaged as they might be. Description from their Facebook page: "The National MS Society is a collective of passionate individuals who want to do something about MS now — to move together toward a world free of multiple sclerosis. MS stops people from moving. We exist to make sure it doesn't."


Marketing Strategy:

Poster stories, information about events(eg. weekly walks) put up on the page.

Sample events from their facebook page:

Join the movement:


Degree of engagement, persistent users, other groups and pages that users of the page also belong to, internally are different metrics tied to developing a community. Also, internally, funds tracked from the page would be a key indicator of success.

Suggestions from the audience

How do you find out a MS champion in the community?
Doctors/particular techniques to deal with MS might be particularly useful for the page, as compared to a generic healthsite.
- Team member said that including a program/information page has been thought about, but not settled on yet.

Group 3 - SPLASH

Members: Xin Shi, Yana Qian, Yan Zhai, Jennifer Sniadecki, Jieying Zhang, Gary Chung
Go to Page

SPLASH is a non-profit program that brings hundreds of high-school students from the Bay Area to Stanford for two days of academic and non-academic classes designed and offered by volunteers, students and affiliates of the University. Their facebook page description: "An opportunity for Stanford students to reach out and share their passion, and for high school students to come for classes at Stanford!"


As this is a new initiative, the group hopes to use the facebook page platform to maintain the current participants as well as to attract more fans.

Marketing Strategy:
They have a well-defined target group, including Stanford students who could become volunteers, high-school students, their parents and educators, so the team can run targeted marketing campaigns(for example, e-mailing high-school representatives, contacting previous volunteers). Experiences of previous participants can be shared through the page, pictures of previous classes can be posted and discussion topics to encourage interaction between fans and management team.

They are also uploading a lot of pictures from their past events, creating more buzz.

Suggestions from the audience:
How are you sharing data in a newer or more interesting way with these students about programs?
- Team members said that they post pictures of and happenings at events and drive discussion topics etc to encourage interaction.

Group 2 - Stanford FML

Members: Victor Andrei, Tripti Assudani, Brian Dumbacher, Nam Kim, C.V. Krishna Kumar, Sai Prashanth
Contact Info:
Go to page

Based on the popular site, the page is targeted at Stanford students who can post unpleasant and funny stories about things that happened to them. It is a space where students can vent out frustration and share it with others.


The goal is to provide entertainment to users through humorous stories. A secondary goal is to measure the stress levels of Stanford students and what factors contribute to those levels.

Sample FML:


Marketing Strategy:
The group feels that the page has a good fan-base and revisit rate, and they are working on improving interactions in terms of posts and comments. Since anonymous posts are currently not enabled, they can measure to what extent people are comfortable sharing potentially embarrassing posts that are linked to their face book profile and the levels of censorship that people use. The group has already trialed using a Facebook ad campaign and is evaluating its benefits.

Suggestions from the audience:
Are the facebook ads working?
- Got a significant number of click-throughs and increase in fans
How is the content relevant to the social data revolution?
- Figure out how stress levels vary in the university, way of getting feedback on courses (in light of the type of generated content, this last hypothesis has not yet been proven).

Group 1 - The Notorious

Members: Chris Cinelli, Tirto Adji, Bilal Badaoui, Alexis Pribula
Go to page

The current page is on places to see before you die. Their facebook page description states: "Look at these beautiful places among the photos and add your suggestions!"


Two key observations made by the team were that people like to have certain icons/ captions on their profile pages and that data is generated on a lot of pages, but there is no way to sort the relevance of the data. Therefore, the team decided to build an application that lets users vote on content and interact in a more meaningful way. The current page aims to offer info and pictures of must-see places.

Some sample pictures from their album:

Marketing Strategy:
Virality comes from people’s post being reflected in the feed stream which fosters interaction. The keywords that potential fans might use while searching - like vacation, travel (found on the description of the page). In the future, they aim to create more interaction among users - user’s own trips and experiences that they want to share with others.

Suggestions from the audience:
How about adding information from exiting travel sites ..
- Authenticity and trust of users has to be maintained - if they feel that it is a marketing ploy, they will stop visiting the page (hidden cost of .

Group 13 - Pro-Human

Members: Engin Erdogan, Na'ama Moran
Contact Info:

How many degrees of separation are there between people who have lost someone in a war? This is analogous to the idea behind six degrees of separation, which has previously proven to be a successful facebook application.

Marketing strategy:
Some of the ideas were to contact known people as well as organizations/non-profit organizations that are trying to foster dialogue between people and to invite a celebrity to make the opening video.
Users will be able to post pictures/videos/wall posts about someone that they have lost in a war, who is exactly one degree of separation from them. Also, a facebook application will be needed to be able to access network data, and graph analysis can be performed on the generated data.
At first, they plan to stick to the principles of the six degrees of separation experiment, but later other interesting metrics can be found.

Initial Wiki Contributors

Brian Dumbacher
Victor Andrei
Tayler Cox
Nam Kim
Andrew Tronson
Sai Prashanth
Sowmyalakshmi Pary H
Tripti Assudani
C.V.Krishnakumar Iyer