Sunday, December 13, 2009

Intro to Computer Science Lesson Plan: Numeric Search


In this lab we have a number puzzle challenge that can be solved with a simple program. The lab works to give students a sense of the significance of round-off error and limited numeric precision.

Here is the puzzle: a music composer who would like to experiment with musical scales different from the traditional 12-note chromatic scale. This chromatic scale is essentially the notes derived from the series of eleven perfect fifths each with ratio 3:2 (or, equivalently think of a string fretted at length 2/3). This ratio series causes a peculiar tone issue, due to a round-off error sometimes called the Pythagorean Comma. This error in unavoidable since 2/3 raised to any power can not equal an octave, which must be an integral power of 1/2. The traditional 12-note scale produces a relative error in tone over 7 octaves of around 1.36%, as can be seen by the following calculation.

>>> (3.0/2.0)**12 / (2.0/1.0)**7
1.013643264770507812

The challenge is to find an alternative "chromatic scale" with more notes (derived by a longer series of perfect fifths) that produces less relative error over some number of octaves. Determine a method to systematically search for a) the precise number of notes in the scale and b) the precise number of octaves that will make the error smaller than the 1.3% c) finally, find the smallest such error possible (given restriction of double precision arithmetic.)

Saturday, December 12, 2009

Intro to Computer Science Lesson Plan: Machine Learning


As noted I wanted to bring "interesting" CS topics to my freshman class. Here is a relatively easy lab lesson that presents a 'linear algebraic' method for movie recommendation. The main idea is that each user and movie is described by a vector of hidden attributes values. How well a user likes a given movie is just the dot product of vectors. To find these vectors we do a computation based on a simplified steepest descent to SVD. I adapted an idea from Simon Funk's blog. Here is the lab. For our class we found that "Dark Knight" and "Napoleon Dynamite" had the extreme hidden attribute values.

------------------------------------------------------------------

We can determine recommendations by computing hidden attribute values that describe the tastes of each user, and how well a movie can be described by this attribute. Let us work on computing the most significant such value, one value for each user and one value for each movie. We can begin by letting all these values be equal to 1.0 as follows.

numUsers = len(allRatings)
numMovies=30
userVal= [1.0]* numUsers
movieVal = [1.0]*numMovies

We are now ready to try to learn the hidden values by using the actual rating values (submitted in advance by each student). We will try to update them in rounds. In each update round we choose one actual rating at random. To do this choose a user and a movie at random and check that the entry in the allRatings matrix is not 0. We will compute the error by comparing the predictedRating to the actualRating stored in our dataset mldata.py.

predictedRating = movieVal[movie] * userVal[user]
error = actualRating – predictedRating

As it turns out, we can always improve the situation my changing the userVal according to the error and the movieVal as follows:

userVal[user] += error * movieVal[movie] * smallnum

We explained why this is true in lecture (steepest descent). Similarly, we can improve the movieVal according to the error and the userVal as follows:

movieVal[movie] += error * userVal[user] * smallnum

We can put the whole update routine in a function as follows where we’ve added a dampening factor so that the updates change things in a smooth way:

def update(user, movie, actualRating):
predictedRating = movieVal[movie] * userVal[user]
error = actualRating - predictedRating
userVal[user] += error * movieVal[movie] * 0.0001
movieVal[movie] += error * userVal[user] * 0.0001

Now simply call this update function over and over on random users and random movies (being sure that there is an actual rating) and (with luck) the numbers will converge quickly to the hidden values we have been seeking.

Intro To Computer Science Lesson Plan : Visualizing Chaos via Orbit Diagrams of Logistic Map and the Henon Map



Last Fall I taught a class to incoming Freshman to introduce "interesting" ideas from the field of Computer Science. I tried to create some labs that would engage the students without being too technical or too dumbed-down. One successful lab had the students take a well-known chaotic function known as the logistic equation x(n+1) = r*x(n)*(1-x(n)) and asked them to try to form a visualization. Here are some of the results of the students.

Here are project results from CS0 students: Daniel Hagerstrand, Cody Mays, and Scott Resnick



Here is basic python code that generates the graphic for the orbit diagram of the logistic equation:



from Tkinter import *
import random
root = Tk()
canvas = Canvas(width=1000, height=750, bg='white')
canvas.pack(expand=YES, fill=BOTH)
text = canvas.create_text(50,10, text="Chaos Test")

def drawcircle(canv,x,y,radius,color):
canvas.create_oval(x-radius,y-radius,x+radius, y+radius,width=0,fill=color)

# Feigenbaum bifurcation values.
# b1 = 3, b2 = 3.449490…,
# b3 = 3.544090…, b4 = 3.556441…, b5 = 3.568759…, and b6 = 3.569692

r=2.4
while r < 4:
for pix in range(50):
x = random.random()
for i in range(100):
x = r*x*(1-x)
drawcircle(canvas,x*1000,(r-2.4)*(750/1.6),1,'blue')
r+=0.005
root.mainloop()























Here is basic python code that generates the graphic for the orbit diagram of the Henon Map:



from Tkinter import *
import random
root = Tk()

def drawcircle(canv1,x,y,radius,color):
canv1.create_oval(x-radius,y-radius,x+radius, y+radius,width=0,fill=color)

canvas = Canvas(width=1400, height=800, bg='white')
canvas.pack(expand=YES, fill=BOTH)
text = canvas.create_text(100,10, text=" Orbit Diagram of the Henon Map")


def HenonMap(a,b,x,y):
return y + 1.0 - a *x*x, b * x

# Map dependent parameters
a = 1.4
b = 0.3
iterates = 100000

xtemp = 0.1
ytemp = 0.2

for pix in xrange(0,iterates):
xtemp, ytemp = HenonMap(a,b,xtemp,ytemp)
drawcircle(canvas,400*xtemp+750,1000*ytemp+400,1,'blue')

root.mainloop()




Wednesday, October 14, 2009

The Future of Social Media

This counter might illustrate the future of what is now called "social media". Although the brand-named web apps/locations are well known, in the future we may just think of social media as we think of the web and internet today - brandless infrastructue. How would such an infrastructue work? New protocols? Probably, since http is end-to-end and will not support real time conversations on any scale.

Wednesday, September 16, 2009

The Fun of Scratch

Do you want a fun way to learn the basics of computer programming?
Scratch is a great application for learning how to program in a very visual and intuitive style. Scratch lets everyone create and share simple graphic based programs. It is easy to make pretty pictures and arcade games. Below are two examples that I have created when working with middle and high school students. Just click the green flags to start the programs. Try to land the spaceship on the key on the moon by using just the arrow keys. Enjoy.

Learn more about this project
Learn more about this project

Thursday, September 10, 2009

Michael Pollan, Health Stats, and Web Search Unfulfilled


I was reading Michael Pollan's article in today's NYT where he made some very provocative claims about the healthcare debate. His main argument is that there is an "elephant in the room" and that most of the blame for the enormous costs of healthcare in the US can be placed on the fact that Americans are far more likely to be obese.

I thought his piece was an interesting "out of the box" look at the issue, and that an interested person should be able to verify his claim by simple searching the web for health statistics. I immediately thought of using wolframalpha.com which is promoted as a "computational knowledge engine" that I had imagined would give a quick, graphic access to widely available statistics. I tried several searches using keywords like, weight, obesity, BMI - all to no avail. So WolframAlpha appears to be useless for comparative health statistics- too bad. Can any major search engine handle the task? A further search using google came up with several not so satisfactory sites. One was a database the WHO Infobase that was difficult and slow. Another was NationMaster site nationmaster.com which gave chart using OECD stats.

This chart showed that US has obesity rates of over 30% with an industrialized country average of 14%. So in essence Americans are twice as likely to be obese, as they spend twice as much on healthcare. Could more (web) data show a stronger correlation? Pollan may be on the right track in his focus on this aspect of the health debate. And here come those corporate interests that he predicted would fight any changes to food industry practices.

Wednesday, August 5, 2009

Computers, Corn, and Copyright: The Cases of Moe Parr and Joel Tenenbaum


Intellectually property law, including copyright and patents, is being used (in the US) with increasing virulence to victimize individuals, at a time when our culture should be celebrating the freedom and openness that new technologies in computing and biology can bring.

This idea is illustrated in the documentary movie Food Inc., where lawyers from Monsanto are shown putting patent law to work to monopolize agriculture, and in their wake are tearing apart rural America and destroying the lives of individuals. For example the movie highlights the case of Maurice (Moe) Parr, a seed cleaner from Indiana, who was harassed and financially ruined by Monsanto legal action. The entire profession of seed cleaning seems to have been undone. Here is a discussion-board on Monsanto's support of seed cleaners.

Then there is the case of Joel Tenenbaum, a file-sharing student whose entire family was threatened and harassed by RIAA law team. The link above leads to very interesting discussion of the law suit and many related copyright cases and the unjust targeting by corporate lawyers.

No matter how the merits of law are argued, the principle of respecting people's privacy must be paramount. If the enforcement of a law means invading a family farm or confiscating a personal hard-drive, then the law needs to change. Moe and Joel represent a growing number of people that need to be supported and not victimized by sleazy corporate interests.


Watch CBS Videos Online

Thursday, July 16, 2009

What is eating your disk? -> % du | sort -n

My sweet sister (happy b-day!) read my blog and suggested the following entry. How do to tell what is eating up a hard disk drive on a Mac. Well here is my solution that uses a simple "unix-style" command in the "Terminal" utility window. One way to get to the Terminal window is to search for it by going to the "spotlight" found at upper right corner of desktop. Once your Terminal window is up and running simply type in "du | sort -n ". Translation: du lists all files with the size of file in each line. We | "say pipe" that output to the sort command. The end of the final output will be the largest files and directories in your system. Now delete those directories you no longer need.

Monday, July 13, 2009

When you just need a table of results -> Google Squared


Keeping up with different search engines and search methods is a challenge. Google has a new application called "Google squared" that will build you a table of search results. Sometimes it gives poor results, but other searches work quite nicely, particularly for topics that have obvious lists.

Here are some examples where GS produces good results:
1. reality TV shows
2. fire island accomodations
3. blues musicians
4. sports cars


Here are some examples where GS produces poor results:
1. english rugby clubs
2. united states senators
3. ohio farmers markets
4. graphic artists

Friday, July 10, 2009

Twitter Searches -> Google Reader or Blog


Recently my son put up a tweet in which he complained about poor internet service by Cincinnati Bell. He soon after received a message by a company representative pointing him to customer service support. Follow-up messages were also sent to him. So apparently people/companies are following twitter carefully. So I wanted to try it myself. Here is my solution to get easy updates:

1. go to search.twitter.com and type in a keyword to follow, such as "Cincinnati"
2. at the top right of the results you will see a link "Feed for this query"
3. add this feed to your Google Reader or to your Blog (as you will see on right, I did this by adding an rss gadget in blog layout) and then follow what people are saying on Twitter.

Update: after speaking to some high-school students it appears that Twitter is not all that popular among young people, as compared to say Facebook. Whether spam will eventually kill Twitter is still an open question. I think filtering spam from Twitter may be a harder problem than filtering webpage search results.

Computer Myth #1



Myth #1. Security and Antivirus Software Is The Best Defense Against Computer Viruses

The Facts: Security software is often out-of-date and it provides a false sense of security to users. A better approach is to be a "virus-aware" computer user by following these simple steps:
1. always update your operating system software using Windows or Mac software updates.
2. avoiding suspicious websites that do not "look right"
3. deleting suspicious emails - if you don't recognize the sender you probably don't want it.
4. when your suspicions are raised but you are just not sure, you can check and search websites like snopes.com that list well known hoaxes.

Bing versus Google


Inspired by David Pogue's recent article, I decided to check out BING. It’s easy to compare it side-by-side with GOOGLE, by going to bing-vs-google.com. It is a nice way to surf the web where you are shown search results from both Bing and Google, on a split screen. Works great on 17" screens. And here is the link to DP's positive BING review.

Update: after several rounds of bing-vs-google searches I would say that BING search results are quite comparable, and its layout and visual design is better.
Google uses more page real estate for sponsored links (bad) and Bing uses more more for related searches (good).