HW2_Analytics

[|Andreas Weigend] Spring 2009 STATS 252 Stanford University
 * Data Mining and E-Business: The Social Data Revolution**

=Homework 2: Getting to Data on the Web= =GoogleTrends (Part A), Google Analytics (Part B), and Yahoo Pipes (Part C)= (Note: Homework2 is to be done on an individual basis) Assigned: Mon Apr 13, 2009 Due: Thursday Apr 23, 2009, 5pm (all parts) Everything you need to turn in is marked in **RED**. Submit to: stats252.homework@gmail.com, and also please bring a hard copy to the class (Apr.27th) for non scpd students.

=Part A: GOOGLE TRENDS= With Google Insights( [] ) for Search, you can compare search volume patterns across specific regions, categories, time frames and properties. 1) Please use Google Insights to analyze trends relating to your Facebook Page from Homework Assignment 1. **Give three interesting results you find with explanations. ** 2) Google Insights aggregates information about regions, categories and time frames to show search trends. **If you were Google, what other factors would you visualize with search trends and why?** 3)There are many other websites providing trending information, for example " technorati "( [] ) can give you search volume across the Blogosphere. **Please find an example website that provides insights for "live" feeds of information.** = = =**Part B: GOOGLE ANALYTICS**= __**Set up your web page, retrieve and analyze web access logs from your Leland account:**__

• Windows: • Mac
 * Step 1**, you need to download and install the necessary software for secure files transfer:
 * SecureFX, click [|here] to learn and click [|here] to download. You can use all default settings during installation.
 * Fetch, click [|here] (serial number included) to learn and click [|here] to download.
 * using Fetch would be similar to SecureFx, follow the [|guide] and remember to change "Hostname" to elaine.stanford.edu.

If you don’t already have a webpage, you will want to transfer one to the **WWW** folder. The opening page should be called **index.html** (a simple example [|index.html.txt.txt]). (See picture below )
 * Step 2**, In SecureFX connect to **elaine.stanford.edu** and log in with your **SUNet ID** and password.

(If you already have your own website from which you can get logs, you can skip Steps 3 and 4.)


 * Step 3**, request [|here] to have your log dump generated for your Stanford web site (if you don’t do this, no log will be visible to you by default).
 * Note: according to the request page, the logdump will be generated in the morning of the next day of your request. So make sure you start this step early.**
 * Note: if you experience problems, please write to the TA immediately. IT has recently resolved an issue in their script processing the requests, but just in case.**

You can retrieve them through **SecureFX**.
 * Step 4**, now you should retrieve your web access logs from the server. It may take a day for the logdump to be generated. You can find them at **your_home_directory/WWW/logdumps/**.

If you don't know how to extract .bz2 or gzip files, you may want to try [|SecureZip] (freeware). If you see everything squeezed into one big line in the extracted file, that's because the file is in unix format ([|more to read] for whoever is curious), try Microsoft Word.

After creating your page and having your friends hit it a few times, you will need to wait another day for the logs to be refreshed

2) Formulate 3 questions to which you may be interested in finding the answers. ** Some example questions are: what is the most popular link in a certain page? or, how many unique ips are there per day?
 * Step 5**, now you can analyze your web log
 * 1) Comment on the format of logs, and print out a snippet.

1) Follow the instructions [|google_analytics_instructions2.pdf] to set up your Google Analytics account. **Note: don't forget step 6 in the instructions to put the code right before tag of any page that you wish to be analyzed.**
 * Step 6**, analyze your website using Google Analytics,

2) In Google Analytics, click "View Reports" for your website

3) **You will be shown an Dashboard consisting of the several diagrams below. Take screenshots and submit these plots as part of your homework write-up, and comment on each of these plots, and how you can use some of the information to improve performance (for example, if you find a product you are selling may attract much more people from Asia than from U.S., you may want to focus on Asia market).**

=**Part C: YAHOO PIPES**= __**Automatic Data Service with Yahoo Pipes**__

In this exercise, we will use Yahoo Pipes to do automatic data collection and build alerts on top of it.

RSS is a popular method used to announce recently updated items. The data of a RSS feed is represented in [|XML] format. There are a lot of online services that allow you to subscribe to your favorite RSS feeds to keep yourself updated with the changes, such as [|igoogle], [|google reader], [|livejournal], [|newsgator], etc. The typical use of RSS is subscribe a RSS feed to your favorite RSS feed reader, and you can view all the content you care about in a single place. You can learn more about RSS [|here]. RSS feed is a common data source used in Yahoo Pipes.
 * Step 1**, understanding the basic concepts
 * What is an RSS feed?

“Yahoo pipes is a powerful composition tool to aggregate, manipulate, and mashup content from around the web.” - from [|Yahoo Pipes homepage]. We will show you an example in step 2, but we highly encourage you to [|learn more] about it beforehand. Here are some very good videos tutorials,
 * What is Yahoo Pipes?
 * []
 * []


 * Step 2**, understanding a real-world example

Assume you are sick of your landlord, and now looking for a new apartment. You want to find a “1 bed-room apartment that asks for less than $1400/month and is also cats-friendly in Palo Alto”. So you go to [|craigslist], and search for it, something like []. But you get two problems: first, craigslist only allows you to limit search to the “peninsula” area, so you have to search “palo alto” in the page; second, you can do the search only when you remember to do so, and you are usually too busy to remember to do it. So ideally, you want the process to be automated, and whenever there is a new listing that matches your requirement, you should be alerted.

Here is the Yahoo Pipe we created to solve the problem, [] shown in the picture below), and we can set up automatic alerts whenever there is a change of the pipe output. You should go there and view[| the source of the pipe] and play around. If you don't understand how the source code works, you should go back to **Step 1** and re-study some of the concepts.

After the pipe is created, you can set up alert on it whenever there is a change of the result, and you will get informed through email, or mobile, or yahoo messenger.




 * Step 3,** questions for you,

Now you can should design a similar problem, and implement a yahoo pipe to solve it. **Please publish your pipe and send the link in the homework submission, along with your problem definition. **