Andreas Weigend
Data Mining and E-Business: The Social Data Revolution
STATS 252, Stanford University, Spring 2009
Class time: this FINAL class is Wednesday, June 10, 12:15 - 3:15PM
Class location: Gates B01

Initial Contributors:

Jennifer Sniadecki
Murphy,Robert Cornelius
Singh, Aditya
Sharma,Chetan Veeru Bhotla
Erdem,Tevfik Burak
Cutler,Blake Robertson
Jones,Matt Knight



Class 9: WEDNESDAY, June 10, 2009 12:15 - 15:15, Gates B01


1. Instrumenting Pages
2. Protecting Privacy with Cynthia Dwork
3. Medial Data with Michael Hard
4. Mobile Advertising with Amit Goswami
5. Online Dating with Christian Wiklund

Guest Speakers:

Cynthia Dwork, Microsoft Research
Research Interests: Private data analysis, Foundations of cryptography, combating spam, complexity theory, web search, voting theory, distributed computing, interconnection networks, algorithm design and analysis

Part 1: Instrumenting pages

1215 Sniffing the digital exhaust
Andreas Weigend

At what point do you feel your privacy is threatened?
In many cases the question of whether or not to share personal data comes down to incentives. There may not be a well defined line between what people are willing and unwilling to share. Gmail has access to your e-mails and delivers targeted ads based on e-mail content, but offers a "free" service with unlimited space in exchange. What kind of information do you share on a daily basis? Some examples of data that is regularly collected about you are search history, browsing history, location (via mobile phones, wifi connections), phone logs, phone usage patterns.

Is information sharing symmetric?
On facebook you know just as much about any of your friends as they know about you. Is someone's ability to use your shared personal information against you offset by your ability to use it against him or her? What about identity theft? Where is the line between your online profile and your real life? How do you verify that someone is who s/he says s/he is? Answer: more data.

1230 Protecting privacy
Cynthia Dwork | Microsoft Research
Cynthia Dwork, Microsoft Research

Is anonymous data anonymous?
How To Break Anonymity of the Netflix Prize Dataset : "The dataset is intended to be anonymous, and all customer identifying information has been removed. We demonstrate that an attacker who knows only a little bit about an individual subscriber can easily identify this subscriber's record if it is present in the dataset, or, at the very least, identify a small set of records which include the subscriber's record." [Narayan and Shmatikov, 2006]

1315 Discussion

Part 2: Instrumenting people

1350 The future of medical data
Michael Hard

Health Care spending is projected to rise to about 20% by 2020. This is one of the highest in all developed countries There is incredible profit potential here in finding more effective ways of delivering health care to individuals. If people can be properly instrumented and better data can be collected, then this data can be used to give people relevant health information *now*. This will enable individuals to make lifestyle adjustments to prevent or reduce long term health consequences, rather than reacting to them as they come.
One question asked was, "how do you convey health risks and consequences to people in a way that will produce behavioral changes?". Generally, people can be slow to make lifestyle changes for health risks that may be a long way off. Additionally, statistics and probabilites tend to be the language of health risks and consequences, which can be difficult to understand. If instead it was possible to get relevant information about current health states such as tiredness, stress levels, etc, then that might provide additional incentive to make use of available technologies.

Startups/Companies discussed:
- 23 and me
- LifeCOMM
- Google Health
- Microsoft Healthvault

1400 The future of mobile advertising
Amit Goswami | Orange-France Telecom Research
  • Intro: Orange and orange Labs
  • Mobile advertising market reality check –US/France/UK market status
  • Data mining and Operator's role…Privacy and regulation
  • Scenario: Orange as a Data trust
  • Business & Technology case for call data mining and advertising
  • Business & Technology case for ISP data mining and advertising
  • Business & Technology case for Location data mining and advertising
  • Final thoughts.

"The advertiser is the customer, not advertisee. The idea is to make advertising easier, and more effective for advertisers."

1430 The future of dating
Christian Wiklund | CEO,
- The Evolution of Dating: Offline, Online, Mobile
- Leveraging real time location: Mr Right vs Mr Right Now
- Some problems: Porn, Posers and Prostitutes
- Skout as the dating ecosystem

Interview with Christian , discussing

1500 Outlook
- The one thing that you have learned in this course

1515 THE END