Survey_SetE

[|Andreas Weigend] STATS 252, Stanford University, Spring 2009
 * Data Mining and E-Business: The Social Data Revolution**

**Email to: stats252.homework@gmail.com.**
This assignment is optional but can be used for extra credit towards your homework grade. Students who attempt this homework will also be able to present their findings to the class.

Overview: The goal of this assignment is to create a comprehensive summary of the Survey Insights worth being featured in the press. Please work together on the following questions to make a cohesive entry that is ready to publish on sites as big as the New York Times or even TechCrunch.

Question # 19, 22, 23, 24, 25:

19. How many messages per day in the following channels do you send, and how many do you receive? Give your best whole number estimate eg 50 22. What would be your one suggestion to improve email? 23. Current email systems either let a message through or mark it as spam. How often do you look in your spam folder? 24. What would it take for you to use a system that ranks your incoming messages in order of predicted relevance for you? 25. One of the ingredients is for the sender to specify how relevant the message is for the recipient. Would you do the extra work and provide this information when sending a message?

//Fed Up With Email?//

Aside from viruses and spam, what else irks people about their email? Is it server problems, forgotten passwords, multiple accounts, or important lost emails? A recent poll conducted by 100 current Stanford students may shed a little light on this question. Below is a visualization of the word frequencies from these students' responses. Organization, prioritization and spam are dominant themes for areas of email improvement.



It is no surprise that email is essential to both business and personal communications, and it is also one of the most abused systems on the Internet. The number of email users worldwide will grow from 1.2 billion in 2007 to 1.6 billion in 2011, at an average annual growth rate of 7 percent over the next four years, according to a Q3 2007 market update from [|Radicati Group]  Unless you have been living under a rock, you know that spam ( designated as unwanted electronic mail sent in large numbers to as many recipients as possible) is a nuisance to every email user. The volumes of spam have increased exponentially over the years and have recently stabilized at an enormous level of 100 billion messages every day worldwide (about 90% of all mail messages), generating significant costs for all organizations running a mail system. Spam exists because some people respond to it, generating income for the spammers who send it.  Detecting spam is difficult. A mail system has to evaluate the probability of a message being spam in order to decide whether to filter the message. Spam fighting is all about statistics, and consequently it is not an error-proof process – some legitimate mails can be incorrectly considered as spam (false positives) and some spam messages can be undetected (false negatives). The challenge is to keep the false positive and false negative rates as low as possible.

So how much do we trust our spam filters? The answer may surprise you. It seems that the jury is still out on this issue. Looking at the graph below, it seems that while some of us trust it completely and never or rarely check our spam folders, the rest of us have no trust in the intelligence of the current spam fighting methods and are compelled to check it at least daily or weekly. And for those of you who have found a way to get broadband to your rock, you may have heard of some of the advances coming to email. One of these is called [|Mail Trends] from Google. It allows you analyze and visualize your email by many attributes including time, size, and recipients. Also on the horizon is email that may make all of our address boxes obsolete. A group of Stanford scientists are studying the relationships between words to figures out who messages are intended for and finds them whether or not senders know the addresses, [|Stanford Project]

What about smarter email? Can you envision opening your inbox and your messages are ranked in order of prevalence for you? [|Inbox 2.0] Would the time savings be worth it to you if you were required to help teach the system by providing a bit of information to every email that you sent? This question was also asked of those Stanford students.

The majority of the students would decline or have reservations about doing any extra work prior to sending their messages. For this group, their main hesitation is the reliablity and simplicity of such a system.

While e-mail enhances productivity and our ability to community both professionally and personally, it also functions as a mechanism for advertising via spam. How we manage these two types of e-mail is one of the determining factors in how useful e-mail is to us. New tools claim to have the ability to manage and reduce the amount of unwanted e-mail we receive, but how well these systems enhance e-mail usefullness is still up for debate.

Student(s) responsible for this page (maximum 3 students):
 * 1) Jennifer Sniadecki
 * 2) Jieying Zheng
 * 3) Catrina Benson

Summary (one paragraph):

More details:


 * Please include at least 1 graphical visualization
 * Supplemental links to rich meta information is also helpful