Next week: Mon 2:30-5 last office hours, Tue noon funny video submission deadline, Wed noon class on privacy and data mining

  Monday June 8, 2:30 – 5:00pm : Office hours
  Tuesday June 9, noon : Deadline for the short, funny videos for the five $200 prizes
  Wednesday June 10, 12:15 – 3:05pm (Gates B01, same room). Class, focus on privacy.

I will be holding office hours this coming Monday, June 8, from 2:30- 5 (basically the usual class time) in my office in the Stats Department

I will give out up to five cash prize of $200 each ($1000 total) for short, funny videos you create that I will use in at least one of my talks this year
Deadline: Tuesday June 9, noon.

Our last class will focus on privacy in the context of data mining and marketing. We are fortunate to have two great speakers who will share their insights with us:
  Cynthia Dwork (Microsoft Research) will bring some shocking examples of how easy it is to crack anonymized data when there are auxiliary data.
  Amit Goswami (Orange-FT Research) will first reflect why our initial ideas on mobile marketing were all wrong – people were talking more than a decade ago how the mobile phone will revolutionize marketing, and then outline how he sees its future.
Amit's company has also generously offered $500 for lunch

Good luck on your finals

I watched part of the Ustream experiment from this week. The sound on Ustream was quite distorted, sometimes hard to make out what people were saying. It is important to understand what and when video really adds. While we announced it on FB and twitter, we only had a couple of people watch it for long enough to get anything out of it. My hope of getting metadata through public annotations in real time, instead of the private notes people take for themselves, is still in the future.

Monday Jun 1 right after class food and drinks in Sequoia Hall, five $200 prizes for funny SDR videos

We'll have John Carnahan, VP Chief Scientist, Audience Network at Fox Interactive Media (FIM) in class on Monday. He is incredibly smart and probably the best guy in the world for online ads, hypertargeting, and monetization.
His company, FIM MySpace, is sponsoring a reception right after class.

To make this a success, we need your help:
  RSVP on facebook : Please let us know by Sunday 5pm that you are coming via RSVP
    http:/ . And invite your friends who are interested.
Also, to move the discussion forward:

I will be announcing the SDR Video Contest. Here is the preview: I will give out up to five cash prize of $200 each ($1000 total) for short, funny videos you create that I will use in at least one of my talks this year. Besides English, it would be great to have Italian, Spanish (Mexico, Argentina), Portuguese (Brazil), and of course Putonghua and Singlish :) since these are the countries I will be in.
Deadline: Tuesday June 9, noon.
You need to:

We want your creativity: interview a friend, roommate, your prof, your parrot, your prof, in any funny situation. Play with twitter, share a story who you met through MrTweet, or what your friends said when you showed them your recommendations. Anything funny coming out of a dating site, or about something you bought because of social commerce that you totally didn't need / that you can't live without any more. Show the world how social discovery is so totally different from traditional marketing or lame Facebook ads. Give an example of what twitter does for you, or how it annoys the hell out of you. Show how you use Facebook, Craigslist etc to get paid/laid. Whatever. Needs to work for international marketers and executive who have never heard of twitter, and should realize just how much they are totally out of it, in a funny way. While entirely voluntary we hope it gives you the chance to think through data in the world we live in from yet another angle, and to express it in a new way.

Anything else? The teaching team passionate about this course and really want you to have a great experience. Let us know what you want! We created two weeks ago and it still is empty! This is your space, use it.

Looking forward to class on Monday and to getting to know you a bit better at the party! Please RSVP by Sunday 5pm. and let your friends know.


Email about extending HW6 (Twitter) by one week and making HW7 (Yobo) optional

Quick update: First, I want to thank Enrique Allen, Ryan Mason and Ron Chung for their amazing help with class, and especially for picking up the work Feng Zheng did not manage. FYI, Feng has been relieved of his duties by the department chair last weekend.
I apologize for the problems with support – you saw my surprise when I discovered during last class that the email I sent did not reach several of you. I also apologize for the delayed feedback on your assignments. Please continue to email to reach Enrique, Ron and Ryan. They are working super hard and do the very best anyone can in a seriously understaffed situation.

I only heard good things about the twitter hack-a-thon last night, almost 20 students came and got help from peers, including Chris Anderson, Emile Chamoun, Carlin Eng, Jeff Mellon, and Mike Polcari. I could not be there since I am at D7 in San Diego, where I talked last night to Twitter's Ev Williams about what you are doing in class.

Given that it took longer than expected to get everybody whitelisted for the twitter API, here are two updates:
We extended the deadline for HW6 to Thursday June 4, 2009, 5pm.
We made HW7 (analyzing the Yobo music DNA data) entirely optional.

I am looking forward to finishing strong together, and creating learnings that endure beyond class. I have created and ask you to please just down any wishes you have for the rest of the quarter.

You really are doing cool stuff -- I believe no other course in the world has similar innovative problem sets.


We thought it would be interesting for students to get hands-on experience with prediction markets.
This exercise will be for extra credit

Here is the setup:

Prediction market exercise starts this coming Monday May 25 and ends on Monday June 8.

Final results will be announced in class on June 10.

This is for extra credit.

4) You should have received an email invite for SDR.INKLINGMARKETS.COM

5) We set up two "practice" predict market games for those that want some practice before official "start".
These two simple "games" end on Monday May 25.

Official games will be set up that are related to the Social Data Revolution. This may include predicting how your SDR assignments will perform.


Try to make the most money among all your peers. It'll be a fun and worthwhile experience.


=INSTRUCTIONSYou should receive an invitation from Inkling to join the SDR Inkling Prediction Market.Please sign in and you should be able to start playing.

You can find the two games here:



=Here is some quick information about the "practice" games:


Market 1:Starting Price: $50 A price of $50.00 means there is currently a 50.0% chance this will occur.
Will the Ralph Lauren stock price increase (relative to its closing price on May 26, 2009) following the release of the earnings report on May 27, 2009? In other words, will the stock price at 4:00pm EST on May 27, 2009 be strictly higher than the closing price at 4:00pm EST on May 26, 2009.

Ralph Lauren's earnings report will be release at 8:00 AM EST. This prediction market will end on May 25, 2009 at midnight.

links to Ralph Lauren's stock price: google finance:

yahoo finance:


Market 2: Who will win game 4 between the Lakers and Nuggets in the NBA playoffs?

description: The LA Lakers are facing the Denver Nuggets in Western conference finals of the NBA. Game 4 will take place on Monday, May 25, 2009 at 6pm. Who do you think will win game 4? Our prediction market will close on Sunday, May 24, 2009 at midnight PST.

Starting Price:Lakers $50, Nuggets $50 $50
A price of $50.00 means there is currently a 50.0% chance this will occur.

Email about HW4 (Delicious) and HW6 (Twitter)

delicious HW suggestions:
This should be done individually, but feel free to discuss questions about Python programming
Reference Collective Intelligence by Toby Segaran
I could not access this book in its entirety (as an off campus, SCPD student) until I signed up for the free 10-day trial. E-mail me if you need a copy of Ch2. - Sylvie B. (

The Twitter HW is likely to take longer than the delicious HW. We are going to hold special office hours with food because many of you have voiced concern. However, since our teaching team is understaffed we are offering extra credit to experienced python developers who are willing to help their classmates. We are planning to host the hack-a-thon Tuesday, May 26th from 7-9pm at the Bldg 524.

If you are a developer interested in helping please respond to

Email about Class 7

Thanks for the positive feedback on my email last Friday Sorry about incorrect youtube link, now correct on

Class today is action packed:

Lecture on recommender systems.

Brief discussion of first insights of the survey
Just started to look at survey results. Quality of work counts, boiling things down to the essence is key. BTW, this also applies when we ask you on the Twitter assignment to reflect – reflecting on what has changed in your mind is an important of how people learn. BTW, while I love good visualizations, simple pie charts often don't add much, giving the percentages is often much cleaner and help us see the bigger picture as we look at the entire page.

Introducing Twitter people recommendation homework
Will be joined by Nick Kallen from Twitter, Yu-Shan Fung from Discoverio, and Nova Spivack. See also Erik Schonfeld's post on "real time search"

The promise vs reality of the "intelligent web" (aka sematic web).
Talk by Nova Spivack, the founder and CEO of Radar Networks, a semantic Web startup that operates the personal information-organizing service Twine. He is a noted authority on the semantic Web, artificial intelligence, and is a space enthusiast.
In 1994, he co-founded EarthWeb, where he was a board member and served as Executive Vice-President for Products, Strategy and Marketing. Prior to that, he worked with computing pioneers Danny Hillis at Thinking Machines and Ray Kurzweil at Kurzweil Computer Products.
BTW, his grandfather was management guru Peter Drucker.

I also updated our wiki home page . If those of you who had emailed me about what you are willing to help with could be so kind and enter your info on I would be sure that I didn’t miss anyone.

See you soon,

A quick email about where we are, two thirds through the quarter.

We've looked at a breadth of material from the evolution of passive data ("sniffing the digital exhaust") to actively instrumenting people and the environment. To help you understand (and experience) the challenges of social data, we assigned the Facebook Pages HW. BTW, for the ad parts, we will revisit machine learning for ads in class on June 1.

We then turned to the basics of analytics, and surveyed some of the now easily available data sources. But the main focus has to remain on how data and models influence decisions. I found the MS&E presentation on Decision Analytics very useful, and hope that you also appreciated the evolution towards engaging the crowd via predictive markets.
Itamar's talk gave us an inside scoop into how Facebook processes petabytes of data and designs experiments that model user behavior. It was fun to come up with hypotheses, and see from Eric how the traditional model of "influencer marketing" seems not to apply to FB with its ultra-light weight interactions any more.
This week, Reid and DJ hopefully shared their insights into analytics and visualization, and showed when edge cases and being clever is important. I was impressed by how analytics drives product development at LinkedIn and enables quick evolution of products -- fast iteration rules.
This Monday, May 18, I will discuss recommender systems. June 1 focuses on machine learning for advertising with George John, CEO of RocketFuel. In the last class, Wednesday June 10, 12:15 – 15:05 (instead of the final), we will discuss privacy with Cynthia Dwork.
The current assignments should not be too much work.
The final two problem sets will hopefully be fun as well:
  HW6 (out May 18, due May 28) asks you to design a recommender system for Twitter suggesting people you might be interested in. We will have the people behind MrTweet come to class this Monday.
  HW7 (out May 25, due June 4) concludes the quarter by giving you data on the music DNA data of, a company in Beijing, where copyright seems less of an issue than here
And, last but not least, thank you for the AMAZING work you have been doing on our course wiki!! I really am impressed by what you manage to create there.
Have a great weekend, and see you Monday at 2:15!

PS: Since we are way understaffed with TAs, if anyone wants to help out including possibly insights for the blog, please do let me know. Made a page on the wiki,
You might also want to check out the short interviews done at the public school at the other side of the Bay…

Email about 2009 course to former students

STATS 252 - Spring Quarter 2009
Data Mining and Electronic Business -- The Social Data Revolution

I hope you are well and working on cool projects. I'm putting together what promises to again be an exciting Stats252 course.
  1. Please join our Facebook page "Social Data Revolution", and post on the wall something you got out of the course when you took it.
  2. Invite the smartest students you know to join that Facebook page and give them a reason to to sign up for STATS 252 this Spring (Axess ID 10584).
  3. Remember everyone is welcome, so feel free to come to any class that interests you. We will have interesting speakers again. If you want to add the course schedule to your own calendar, please note that the class is one hour earlier that last year, i.e. Mondays 2:15 - 5:05, Gates B01, starting April 7th.
  4. I would also like to invite you and/or your friends to the information session / preview lecture on Monday, March 9, 2009, 2:15 – 3:05 at Skilling Auditorium.
Since last year, the creation and distribution of social data has vastly increased, but we are still facing the problems of information overload and of discovery of relevant content and interesting people. Social marketing is more important than ever
And as always, please let me know if anything exciting is happening, and also if there is anything I can do for you. And if you can’t make it to class, I continue to put up mp3s after each class, and have high expectation for this course wiki. New is that I will also have transcripts of each class on the web, and a weekly short videos related to the course (some as interviews with guest speakers). Subscribe to .

Andreas Weigend | +1 650 906-5906 |

We have for this year Reid Hoffman (CEO of LinkedIn), Jia Shen (CTO of RockYou), George John (CEO of RocketFuel, using machine learning for ad serving), Nova Spivack (CEO of Radar Networks / Twine), Mingyeow Ng (CEO of Discoverio / MrTweet) and Cynthia Dwork (brilliant researcher at Microsoft on data mining and privacy). On the more practical side, we have Nick Kallen (Twitter) and Itamar Rosenn (Facebook) being available for questions and ideas you might have given the two homeworks we are doing on those platforms, and three Stat252 alumni will share what they have learned and done since, Linus Liang (giving reasons for his successes first with Facebook Apps, then with iPhone apps), Bo Cowgill (talking about Google's internal predction markets), and Eric Sun (showing how information spreads on Facebook and discussing its implications to marketing).

Email about 2009 course to departments

Sent: March 2009 to friends in CS, MS&E, and GSB asking to forward to admin to forward to prospective students
STATS 252 - Spring Quarter 2009
Data Mining and Electronic Business -- The Social Data Revolution

Class time: Mondays 2:15 - 5:05 (starting April 6, 2009), Gates B01.
Registration: STATS 252, Axess ID 10584.

Extract Insights from Twitter
Put Yourself in Google's Shoes
Develop Relevance Beyond Amazon
Build Revolutionary Facebook Applications
In the last year, your location data and personal medical information have become the latest streams in the torrent, joining email, clicks, searches, social networking, and buying patterns. This course will dramatically change how you think about your river of data.

How can these new data sources make our lives easier, more effective, more interesting? How can we get better recommendations, based on our behavior and the behavior of our friends? How can reputation systems help with decisions about who to trust?

Gathering, sharing, and storing data has become trivial. But what shall we collect, and what applications can we build that users really want?

Moving beyond graph and guess, push and pray, launch and learn, and so on, this course gives you tools and strategies for successful applications. How can you optimize their virality, and spot weaknesses early? How can you entice users to interact with the app, and recommend it to their friends?

Each class is structured according to PHAME: define relevant Problems, invent Hypotheses, create Actions, design Metrics, and conduct Experiments. We also introduce a key driver to encourage users to provide critical data: Return on Personal Engagement (ROPE). Users who gain a benefit (tangible or psychological) from participating are far more likely to do so, and we discuss how to design incentives to encourage participation.

In addition to discussing applications that succeeded, we also discuss

STATS 252 it taught by Andreas Weigend (former Chief Scientist of who shares his first-hand experiences of working with some 100 companies over the last 15 years.Previous guest speakers include Jeff Hammerbacher (former head of data at Facebook), Reid Hoffman (founder of LinkedIn), Jan Pedersen (former chief scientist of Yahoo), Joshua Schachter (founder of Delicious), Paul Muret (Google Analytics), Johann Schleier-Smith and Greg Tseng (founders of Tagged).

The course is open to undergraduates and graduate students with experience in web programming. They are expected to actively engage in class discussions, to have their assumptions challenged, and to bring their diverse backgrounds to bear. After each class, a detailed write-up is created by the students as the course wiki.

Hands-on assignments include understanding and leveraging data and analytics on the web analytics, defining robust variables and metrics and that characterize engagement on a Facebook pages, designing several contracts for a prediction market related to class, subsequently participating in its inkling implementation, creating a recommender system for web content based on delicious data (code supplied in python)

Further possibilities are applying geolocation to dating, creating a recommender system for Twitter
Questions? See More questions? Email Andreas Weigend