Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
STATS 252, Stanford University, Spring 2009
Class time: Monday 2:15 - 5:05 pm
Class location: Gates B01

Initial Contributors:

Pierre Djian
Carlin Eng (ceng [AT] stanford [dot] ee dee you)
Eric Ma
Katrina Hui
Daniel Aisen
Hoon Kim
Wu, Wen
Sumithra Jonnalagadda

Class 8

The transcript for the class can be found here.

The audio recording of the class can be found here.

Targeted Ads:

There are a variety of applications for targeted ads on the web today. These various methods include:
  1. Search
  2. Content/context
  3. Behavioral (which items/pages you look at to measure interest, etc)
  4. What about social networks?? Ex: status: “I want an LCD TV.”

Search Ads:

  • Query marketplaces: targeting to the query
  • Pay-Per-Click (PPC) vs Pay-Per-Impression (PPM)
  • Massive Economy: Google, Yahoo, MSFT, SEM, SEO, Affiliates
  • Pros:
    • Great Metrics:
      • Click-Through-Rate (CTR)
      • eCPM (Effective Cost Per Impression): CTR * Cost-Per-Click (CPC)
      • Explicit intent
  • Cons:

Content Match Ads

· Publisher Marketplaces: Targeting to page text
· How different is this from Search Ads?
· Danger of Misplacement?
· Multiple applicable queries in context?
· Publishers vs. Social networks
· Result: Much lower CTR and eCPM

Social Network Ads

· User Marketplaces: Targeting to the user
· Who is the user?
· Do we make the same mistakes we did with Content Match?
· Is it like television? Branded? Is it Share of Voice?
· Do we really know more about these users?
· Will they respond to ads?
· How about the creep factor?
· Given there will be ads targeted ads should at least perform better.

Social Network Data

  • Why do they provide data? Niche Envy.
  • What data do users provide?
    • Demographic, About Me, General Interests
    • Semi-structured: movies, interests, music, heroes
    • Self-expression: embedded songs and videos, Comments, Status
    • Hand-raisers, no rankings: movies examples
  • What is taboo?
    • Private profiles, emails
  • How honest are the users? Should we care?
    • Common advertiser question
  • Why FIM?
    • 100+ Million profiles: most of our problems are related to scale

Classification of Users

· People identify themselves as sports fans, so we have positive data-points
· No negative points, though. Using machine learning, we can’t really classify people that AREN’T sports fans because they just might not have identified it.


· Simple Item-Based Collaborative Filtering
o Are the relations between the features based on user co-occurrence useful? How robust are the social features?

CF Recommenders by Cosine Similarity

  • Take 2 Vectors: Users with Band Prefs
    • Calculate the co-occurrence of bands in user space
    • U*V / (||U|| ||V||)
  • Inverted Index: Rows are Bands, Columns are Users
  • Pairwise Similarity: Dot Product
    • Inner Product:
      • Number of users in common
      • Does not scale well, huge memory requirements

CF Recommenders using Hadoop

· Hadoop: Map/Reduce
o Map: Input Data -> (k1, v1)
o Sort
o Reduce: (k1, [v1]) -> (k2, v2)
· User A: 1, 2, 3 Map: User A Data -> (1,2) (2,3) (1,3)
User B: 1, 2, 4 User B Data -> (1,2) (2,4) (1,4)
User C: 1, 4, 5 Reduce: (1,[2,2,3,4]) -> (1_2,2), (1_3,1)
· No inverted index, No big memory requirement
· One pass through the user data

User Topic Modelling

· Topic Modeling: no need for multiple clusterings because users can exist in more than one concept, no need for training data
· Latent Semantic Analysis
· User’s features can be represented in fewer concepts
· Example: Fred likes Star Wars, Rush Hour, Terminator, LOTR, Waterboy
o Fred viewed as someone who likes Action, Science Fiction and Comedy movies (also know different levels of each)

Singular Value Decomposition (SVD)

· Find the 3 matrices U, S and V such that when multiplied together would approximate original matrix
· Reduce to set of weights for each user and feature vector that is equal to number of singular values

Gradient Descent SVD

· Minimize RMSE between predicted and expected for only known values
· Take partial derivative of squared error with respect to each parameter to build weight-vectors (U and V) for each singular value.
· Note: Root-mean-square works very well when you have rating data, but not so much for binary values
· Iteratively compute this for each known rating: user(i), item(j) and singular value(k) which reduces to:
o U[K][i]t] = U[K][i]t-1] + residual_error * V[K][i][t-1]
o Residual Error = Learning-rate * (actual rating – predicted rating)

Ad Frontier

· Using the social graph
o Target friends of people who clicked on an ad
· Marketplace Overlap
o Economic implications of overlapping hypertargets
· Targeting Prediction
o Predict search queries using associations between queries and hypertargets
· Click Prediction
o Can we predict the likelihood of what hypertargets are most likely to click an ad
· Hadoop and beyond
o Dealing with large data bases, streaming data, online learning
· Should we assume friends have same interests/preferences as each other?

Bidirectional Communication
Costs of communication are virtually nil, and as a result, bidrectional communication has exploded. We have arrived at the Social Data Revolution.
That bidirectional communication, which has been possible because of the costs having basically gone to zero, is what we are exploring here, in the Social Data Revolution.

  • List vs. stream:
    • List: Having a list requires work on our end - we have to search through data.
    • Stream: Having a live stream is like having information come to us.

  • Expectations
    • Email: Expectation to be read
    • Tweet: No expectation to be read.
      • How is this communication? What role does real time play?

  • facebook_logo.jpgvs. twitter-logo.jpg

  • Facebook
    • bidirectional interaction
    • tremendous distribution
    • reciprocity
      • If I comment on your photos, status, posts, etcetera, maybe I'll receive attention in return.
    • self-recognition
      • photo tagging is a great example
        • idea of immortality or being outlived by your photos

The Social Data Revolution is all about people looking at data differently. It is about "consumer behavior" and "expectation shifting."
  • It is "a shift in the individual’s expectations towards what they can get in exchange for sharing data, sharing data about themselves, and sharing data about relationships to others."
  • Voluntarily sharing information

  • Producers/companies used to know more about the products; they were the experts.
  • Now the experts are the consumers
    • If you Google Nokia Map Activation - all consumer websites, not Nokia sites.
    • Google knows more than any company about a product by indexing, storing, and searching the web.

Using the voluntary exchange of data for product marketing and customer service:

One big thing about the Social Data Revolution is the real time element of the exchange of information.
  • mystarbuckidea.comis one idea - people can share ideas about pricing, atmosphere, wifi, etcetera.
  • is a people-powered customer service website where they'll help you with it, no matter what.
  • is a site where people post their experiences in business and 1st-class seats on airlines

What is the incentive? Is contributing to the community a reward in and of itself? Or do people expect more in return?

From C-to-C to C-to-W

Shift from private sharing of information to public of information: I used to share / email the book I liked on amazon to my friends so that they can have a look at them and see if they liked them. That was the "Customer to Customer" model. Now, with Facebook and Twitter, we are increasingly moving towards the "Customer to the World" model: if I post on facebook or on twitter that I have purchased a book, then all my friends and follower will get notified. Even more, that data will saved and indexed on twitter -> it will remain there for eternity.

One big challenge raised by this change is how do we deal with the wrong data that has been posted on us? -> it's very difficult and there are very limited means to deal with.

One interesting point is that in this shift from C-to-C to C-to-W, we actually can notice a greater accuracy of information, as described by Reid Hoffman when he said that self-written LinkedIn profiles were often much more accurate that the resume that people send to potential employers - because of the issue they would have with their friends or connections seeing inaccurate or misleading details about them.

Real Time Data Search:
Real time data search is a new phenomenon. We don't really know its role yet. It is a chance, however, to interact with people and immediately receive feedback. A different type of search, most useful for information that changes quickly. Some possible applications of real time data search:
  • PHAME framework for a marketing promotion: throw an idea out there and receive an immediate response. Chance to experiment.
  • Ability to tell what is hot, popular at a location, in society, etc.
    • For example, which restaurants are popular? And what menu items?
    • Movie theater lines. How long/short are they?
  • Customer service hotlines: quickly understand when/why problems arise and address them before they become bigger issues.

At the same time, there is a trade off of not being able to always see the broader time horizon.

With real-time data, rather than asking what data can we extract from given information, we can turn the question around to ask: "what data can you get now?" It's possible now because we interact with data quickly.

By living on the web and in the mobile space, we have an amazing opportunity to really interact on a very fast time scale. For Andreas, he really enjoys the interaction part, which is why he went to Amazon.

With real time data, interaction, and PHAME framework, marketers now have the opportunity to drive more than 10%-20% of total sales from search recommendations.


EtherPad is the only web-based word processor that allows people to work together in really- real time. When multiple people edit the same document simultaneously, any changes are instantly reflected on everyone's screen. The result is a new and productive way to collaborate on text documents, useful for meeting notes, drafting sessions, education, team programming, and more.

Virtual Meetings

EtherPad is useful whenever multiple people with computers need to work together in real time. With EtherPad, anyone in a meeting can contribute to the notes, or watch them as they're typed. This means more efficient meetings, more useful notes, and fewer misunderstandings.

Collaborative Writings

Effective writing often means sweating the small stuff. For marketing copy, prose that goes on your company's home page, press releases, or emails sent to the board of directors, drafts are edited and finalized with the help of many people on the team. Previously, reaching agreement meant sending multiple copies of documents by email. With EtherPad, text can quickly be finalized by having all the stakeholders come to the same pad, make their edits, and collectively sign off on the resulting document. For example, EtherPad was used to edit the final draft of this very web page.

Team Coding

Two eyes looking at code simultaneously means catching more bugs and generating more new ideas. EtherPad lets programmers collaborate on code in realtime for authoring, refactoring, or debugging.There is an entire practice called Pair Programming which advocates this. Traditionally paired programming required 1 person to be the "driver" and another to be the "reviewer", and for both to be seated at the same physical keyboard. With EtherPad, pairs (or teams of more) can program regardless of their geographic location, and they can take turns "driving".