Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
STATS 252, Stanford University, Spring 2009
Class time: Monday 2:15 - 5:05 pm
Class location: Gates B01

Note: Many people have asked what Social Data Revolution means. Here is my attempt to explain it. Every student in class can edit this page (use the History Tab above if you want to see changes). Also, feel free to put comments into the text and let me know if you want to discuss anything in person. Once we have converged to some decent content, it will go up on -- Andreas

What is the Social Data Revolution?

The Social Data Revolution (SDR) denotes the shift in mindset of individuals towards data they knowingly and voluntarily share with a potentially large audience. It impacts their notion of self, as well of relationships, both with other individuals, and with companies and organizations. Individuals are increasingly willing to create and share data (including personal data such as geolocation, medical information,…) and they increasingly expect to get some value in exchange.

Humans have always shared information with friends and members of their tribe -- where to find good food, what places to avoid. The cost and overhead of communication was high, and before web search, the chance of someone coming across what you wanted to share was low.

Historically, the technology to move energy led to the industrial revolution and changed the way we produce things. The technology to move bits led to the information revolution and changed the way we produce knowledge. We live in the one period of time when people have connected globally, and where the direct cost for the end user has essentially dropped to zero. The ease of data creation, and the feasibility of global sharing with essentially universal access has led to the the social data revolution, where
  • Social means sharing.
  • Social data refers to sharing data, data mainly created and shared by individuals: Data about themselves, data about their relationships with others, data about products and services, data about the world.

Early examples are Craigslist and the wishlists of Both enable users to communicate information to anybody who is looking for it. They differ in their approach to identity. Craigslist leverages the power of anonymity, while leverages the power of persistent identity, based on the history of the customer with the firm.

Recent examples are Twitter and Facebook. On Twitter, sending a message or tweet is as simple as sending an SMS text message. Twitter made this C2W, customer to world: Any tweet a users sends can potentially be read by the entire world. Facebook focuses on interactions between friends, C2C in traditional language. It provides many ways for collecting data from its users: “tag” a friend in a photo, “comment” on what they posted, or just “like” it. These data are the basis for sophisticated models of the relationships between users. They can be used to significantly increase the relevance of what is shown to the user.

While both Twitter and Facebook are platforms where users create and share data, they significantly differ in who can access the data. In essence, whatever someone tweets is public -- anyone can look up all of @aweigend’s tweets, now and forever. Facebook requires the person you want to find something out about to confirm that you actually are "friends" before giving you access to their data. Time and the collective creativity of users will tell how these two very different paradigms will evolve.

The social data revolution is a revolution with hundreds of millions of regular people participating around the world. This year, more data will be created by them than all of the data created by mankind up to last year.

How do companies respond to the opportunities and risks SDR presents to them? Traditional marketing approaches optimized for the single audience in the television era has ceased to work in today’s C2W world of media fragmentation. Similarly, in C2C or social commerce, people trust information created by other customers more than than company marketing materials when they make purchasing decisions.

More information will be created and shared at, which emerged out of courses taught Spring 2009 by Andreas Weigend at UC-Berkeley (marketing) and Stanford (data mining).