7_Recommend+5.18

[|Andreas Weigend] STATS 252, Stanford University, Spring 2009 Class time: Monday 2:15 - 5:05 pm Class location: Gates B01
 * Data Mining and E-Business: The Social Data Revolution**

=__Class 7__=

Agenda:
1. Recommender Systems 2. Survey Insights-HW5 3. Twitter 4. Semantic Web

Announcement:
If anyone didn't get Professor’s emails (one on Friday another on Monday) make sure you contact him

1. Recommender Systems
Why recommendations? What's the big deal about recommendations? There is so much information out there and it's not just information. Some areas in which we might want recommendations are music, books, dates, and clothes.

A. Commonality
Commonality refers to the number of potential things you can recommend. Higher dimension commonality implies that it is much more difficult to give recommendations. Clothing, for example, has high dimension commonality and is difficult to classify. It is therefore very difficult to offer recommendations. For music or books, there are perhaps only 1M choices of each. It is therefore easier to make recommendations on these topics. One example of a super low dimension commonality space is traditional ads. There might be only 30 ads for a product at any given time.

B. Finite vs. Infinite Resources
Making recommendations also depends on whether your resources of finite or infinite. Dates are an example of a finite resource. If you match two people together, then those people are taken off the market and you can’t match them with anyone else. Books, on the other hand, are an example of infinite resources. If you recommend a book to somebody, you can recommend that same book to thousands of other people as well.

With finite resources making recommendations can be much riskier. Since you can’t recommend somebody that has already been matched, who you recommend can have an affect on the other recommendations you make. Additionally, you can also change the experience of the recommended person.

C. Preference Shift
Every model assumes everything else is equal when making recommendations. People’s preferences frequently shift when you add a third choice. Often something you add enhances a third dimension and as a result people change their opinions. By removing an option nobody has chosen, people may actually change their preference among the remaining options.

D. User Feedback
There is a big difference between people actually rating things vs. simply clicking or not clicking. For Netflix 50% of what they send out is in the top 1000 movies. There, people give explicit ratings. It takes about 10-30 seconds do decide whether you like a song. It takes much longer for movies. This distinction whether you want the user to work for you (explicitly rate choices) or not reflects the complexity of the item and the time investment you must take.

One problem regarding user feedback is that you can’t really interpret negative actions. Amazon might be absolutely convinced that you really should get a certain book but you don't buy it. What does a machine learning system take away from that? We don’t really know why you haven’t bought the book. It could be that you already have 3 copies of the book at home or it could be that you're just not interested. There is no way to tell—the action is identical in both cases.

E. Imbalanced Data Sets
Another dimension is imbalanced data sets. The click-though rate on ads is about 0.1%. That means you’re 99.9% correct if you assume nobody will ever click on your add. There is a good example from the first homework about weather predictions. Our answer to problems always depends on the metrics. If we choose reasonable metrics, we actually do better not having any information and saying the weather tomorrow will be the same as today. The point is to come up with metrics which truly reflect what we want to do.

Example of the PHAME Framework in Terms of Recommendations: News
Marking a news story thumbs up or thumbs down gives us very little useful information. "Mark as read" has a similar effect - what does it //mean//? If the user tags something then it's much more helpful. Do we show the stories in the order they were posted to the website? Or do we try to show the most relevant stories?
 * ======**Problem: How can you get user feedback?**======
 * ======**Hypothesis: We can do a better job of recommending news stories**======

What would thumbs up/thumbs down mean here? It could mean this article was poorly or well written, I'm not actually interested in fires, but I was interested in Santa Barbara. If we ask the user "do you want to see more stories about wildfires?" then more wildfire stories can be shown or recommended. If the user replies that they want to see less about Santa Barbara, then less stories can be shown or recommended.
 * Example - [|A story about the Santa Barbara wildfires]**

One example of this kind of system can be found in the reviews on [|Courserank] : Here, users are essentially thumbing a review up or down. But what can we learn from this system? Not that much, really. The user doesn't have a very good idea of what they want, and marking up or down gives very little information to the site about what to do next. They don't know which aspect of the article led to a positive or negative response - the writing, the subject matter, or even the font. A similar unhelpful system is the "more" button. Without other data it is essentially impossible to determine what to display to the user.
 * Thumbs up/Thumbs down Recommender Systems**

What if we ask the user //why// they chose their rating with some multiple choice questions? Although the user interface of where we place the bubble, etc, is important, that doesn't help with the question. The user has to understand how the feedback will help them in the future.

If a user tags the article with the words that they would use to describe the article (for example in the Santa Barbara example with "wildfire", "SoCal", and "evacuation"), then they can be shown more stories with similar tags. The process can be customized if the user responds by clicking links to see "more of.." or "less of.." a certain related topic. Both of these create metadata, which gives companies much more to work with. However, "tagging fatigue" could set in, diminishing the usefulness of the feature.
 * Recommender Systems with Tags and Metadata**

Essentially, tagging and metadata more useful than yes/no, up/down systems, which are still better than nothing. There is more of a time cost in tagging an article in some way, but it generates more useful results for the user. If they understand that the tagging, etc will result in more customized results, then there is a higher likelihood of it actually happening - with the idea that what they do now should affect the system in the future. Compare it to a dating site: the more accurate and abundant information is added to a profile, the higher the chance of a good match. Most involved users would see this and adapt for the providing of more information. It is difficult to understand the sentiment of a response - look at Twitter. A large proportion of tweets have very little sentiment attached to them. As a result, we shouldn't expect miracles from recommender systems yet. We can run two different recommender engines in the background, and show the users results from both. For example, one that makes recommendations based on tags and one that places more emphasis on recent articles. Both should be "deep structure", ie that the user doesn't see the difference between where the recommendations come from. This is a hard one - how do we measure delight? Some suggestions from the class:
 * ======**Action: Two Engines**======
 * ======**Metrics: Measuring satisfaction**======
 * 1) The link is sent to a friend.
 * 2) Biofeedback - get people to wear a heart rate monitor and analyze that data
 * 3) Time it takes to click on a link from after the recommendation appearing.
 * But more important is the time it takes for them to return to the site. If it's a few seconds then that recommendation must not have been good.
 * 1) Ask them
 * 2) Use the built in cameras in laptops and do some facial expression recognition.
 * But then we're watched.. All the time..
 * This could lead to some extremely targeted ads. For example, if it registers that you're watching TV by yourself, some ads for snack food or a dating site could come up.
 * A similar experiment was attempted by AT&T Bell Labs.
 * 1) All together, there's so much meaning in human gestures and even musical instruments that it's hard to attach meaning to clicks.

We have to find the costs of false positives and false negatives for this. Because of the extremely fast world we have now, we don't have time to read through backlogs of information. So the cost of missing something that is important can approach infinity. We can run experiments with the two different recommender engines we have. Fortunately, the users won't notice most mistakes.
 * ======**Experiment: What works best?**======

Useful Links on Recommender Systems:
[|The Recommender Industry] [|Methods and Metrics for Cold-Start Recommendations] [|International Journal of Electronic Commerce: Special Issue on Recommender Systems]

2. Survey Insights from Homework 5
Actual data from the class survey: [|Survey Data]

**Incentives for Email Ranking**
The results for the question "What would it take for you to use an email ranking-by-importance service generated results from three main categories:
 * 1) I would use this without any incentive besides the idea.
 * 2) It would have to work reliably for me to use it.
 * 3) I would only use it if it meets my specific needs or is forced upon me.

The ratio of each response matches closely with Geoffrey Moore's seminal work, [|Crossing the Chasm]. He describes the adoption of high-technology products as coming in several stages, characterized by the kind of people willing to use (and pay for) a product. The "chasm" of the title is the space between the early adopters and the mainstream market. The class breakdown gives STATS252 a large percentage "visionaries" (near 30%), who are willing to try the product without it needing to be perfect. An occasional bug is perfectly acceptable. A further third of the class was "pragmatist" in nature, only accepting the product when it works for their everyday needs without flaws. Then a smaller fraction of the class are conservatives and skeptics, who will only take a product if they //have// to use it.

Useful Links:
[| Amazon "Crossing the Chasm"]
 * [|An actual email ranking service!]**

**What would you be more willing to share with your online friends than your real-life friends?**
__Two key results and follow up questions:__
 * 30% did not understand the question
 * What is the difference between a "real-life" friend and an "online" friend these days? Are they useful distinctions?
 * 10% more willing to share emotions/embarrassing events online
 * Fears are easier to talk about online, so why not share them?
 * [|Click: What Millions of People Are Doing Online and Why it Matters] - users will search for fears, like the fear of public speaking, in the privacy of their homes. They may not be as likely to talk to others about these problems.[[image:http://coverart.oclc.org/ImageWebSvc/oclc/276819187_140.jpg?SearchOrder=BT,AM,IN width="140" height="211" align="right"]]

How open you are is determined by your DNA. If you go to a restaurant and see Sea Urchin sushi - do you try it? You might, but your friend might not. This is mostly determined by your genetic structure. So some of the results of our in-class survey were not even up to the respondents! We were more predetermined than we thought.
 * [|Spent: Sex, Evolution and the Secrets of Consumerism]**

As an aside, it is interesting to note that //advertising endows properties unto the purchaser//. So if I go and buy that BMW, I am not just buying a car - I am getting all the cool that comes with owning //that// car.

Feedback for Survey Structure
__Don't use pie charts to represent numerical data!__ You will get this: media type="youtube" key="IQqM4MXMYpM" height="212" width="261" Instead, use actual numbers and bar charts. That way, we can get a feel for the numbers and scale of a situation.

Guest Speakers:
[|Yu-Shan Fung] from [|Discoverio] [|Nova Spivack] from [|Twine]

People recommendation space : How can we discover people who we might be interested in? Mr Tweet helps users discover interesting people that they may like. Mr Tweet helps users cut through all the noise to easily identify the most important people and content you should be paying attention to. But what should be the object of recommendation? Should it been an individual tweet, a person or overheard conversations? Yu-Shan of Mr Tweet (Discoverio) has some interesting hypotheses around collaborative filtering. For example, one approach would be matching Twitter users with other twitter users who follow the same people. Another approach would be to measure similarity through semantics on top of tweets (i.e. are people talking about the same industry? The same interests? One can think about link and hashtag analysis as well. By throwing out hypotheses, you will be able to create richer applications.

This is content analysis rather than social graph analysis. Each analysis is totally different from the other. Nick Kallen of Twitter argues that the Twitter recommendation engine should differ from that of the Facebook. The Facebook recommender widget is used to find friends you know in real life who you didn’t know were on Facebook. The goal of users on Twitter isn’t necessarily to find people you know in the real world. There is less imperative on twitter to find people you already know. Twitter doesn’t want to answer the question “what algorithm should we use?”, rather, they should be answering “towards what ends are we recommending people and why are we trying to establish a connection between people?” In addition, Twitter separates the class of users in its recommendations. New users, Nick says, shouldn’t be recommended the same users that existing members of the service are recommended. Currently, for new users, Twitter suggests a mix of popular individuals, Twitter accounts affiliated with mainstream blogs, and celebrities.


 The service tries to assist the unmotivated newcomer, or someone who joins out of general Twitter curiosity, not because they want to follow offline friends in an online space. The new user wants to have some sort of “community connectedness” when they sign onto the service, and this is why Twitter recommends recognizable personalities to this class of user.

Nick mentioned that it’d be interesting to look for people who are talking about things im likely to be interested in. Show the recommended conversations. His hyptothesis is “I am much more likely to make a conversion of a recommended user not if we show not the essence of the user but the conversations theyre participating in.”

4. Semantic Web & Twine
Guest Speaker: [|Nova Spivack] from [|Twine]

Semantic is "meaning" and Semantic Web means "adding meaning to the Web".

History: Military started the use of Semantic Web in order to do reasoning on their data. For example, finding facts, like what cities are most likely to be a target of terrorist attack. DARPA created the DARPA Agent mark-up language which later turned into [|Web Ontology Language - OWL]. The World Wide Web Consortium created the "Web Ontology Working Group" which began work on November 1, 2001.

We are now in the 3rd decade of the web




 * Web 1.0** was all about building the infrastructure.
 * Web 2.0** is about the social web.
 * Web 3.0** is the web of meaning, where links are associated with meaning that can be interpreted by software. For example, a human can understand a "friend" relation on Facebook (close friend, acquaintance, business colleague, etc.) but a software cannot understand that.

[[image:tagcloud.gif width="216" height="200" align="right"]]
__Examples__: [|delicious], [|Flickr][|Technorati]

__Pros__: this is a simple approach that doesn't require learning of a new technology __Cons__: you still need to rely on people to add tags; many people are just too lazy.

2. Statistics approach - Interpreting massive data sets using mathematical and statistical tools
__Examples__: [|Google], [|Lucene] __Pros__: very scalable and language independent __Cons__: this approach does not solve the problem of meaning. Algorithms can find correlations but cannot understand semantics.

3. Linguistics approach - Attempt to create software that can understand language: grammar, syntax and units of meaning
__Examples__: [|Powerset], [|Inxight] __Pros__: the potential to make queries in natural language. eg: Who was George Washington? Extracting knowledge from the text, eg. 'Rome' is the capital city of Italy, Bill Gates is the founder of Microsoft, etc. __Cons__: this approach is computationally expensive and is hard to scale to a web scale. For example, [|Powerset]was only able to release a Wikipedia demo. Some people argue that the company was not able to scale its technology beyond a few specific sites. The company was eventually sold to Microsoft.

4. Semantic web approach
This approach is all about about creating meaningful structure for the web. For example, Google cannot give us an answer for a query such as: Ford Taurus 2005, with a 3 cylinders for less than $20,000. Metaweb, which is attempting to build a web-scale, database of the web, is attempting to do just that; in other words, enabling us to run web queries as if we were querying a relational database.

__Examples__: [|Radar Networks] (makers of Twine), [|DBpedia], [|Metaweb] __Pros__: more precise queries, not as computationally intensive as the linguistics approach __Cons__: still needs metadata. For example, Metaweb are generating a lot of their metadata themselves. Another example is the failure of webmasters to adopt OWL and [|RDF], even though both have been endorsed by W3C. There is just not enough incentive right now for people to add semantics to their websites.

5. Artificial Intelligence approach - the ambitious goal of approximating the knowledge of a human expert
__Example__: [|Wolfram Alpha] __Pros__: this would be amazing!!! __Cons__: computationally expensive, very difficult to achieve with the current infrastructure and knowledge base



To summarize: the goal of the semantic web is to create an open database layer for the entire web. This will hopefully enable us to harvest better the deep meaning hidden in the web's structure, rather than relying on keyword search, or link structure search (i.e. Google). It will also give rise to a new set of software applications - that can programatically understand meaning and act upon it.

Useful Links on Semantic Web:
[|Freebase] - web scale database that Metaweb is creating [|Tim Berners-Lee] - the inventor of the World Wide Web, talking about the Semantic Web: media type="youtube" key="mVFY52CH6Bc" height="344" width="425"

Initial Contributors:
Georgia Andrews Tristan Walker Na'ama Moran Chris Anderson (chanderson@stanford.edu) Tina Cardaci Catrina Benson (cat@catrinabenson.com) Jonathan Beekman Gary Ho (gary228ho@gmail.com)