dimanche, avril 04, 2010

The Intrepid

This is an American version of a 'patriotic education centre', as we call it in China. I was really surprised by the sheer volume of visitors on Saturday. Weather factor, maybe? Coincidentally (or not), this gigantic aircraft-carrier is docked right in front of the Chinese consulate. Hmm...

samedi, mars 27, 2010

Math land v.s. CS land

I came across this article on an NLP blog (http://nlpers.blogspot.com/2010/01/arxiv-and-nlp-ml-and-computer-science.html) that made a humorous comparison of publishing a paper in the 'math land' and in the 'CS land':
-----------------------------------------------------------------------------------------
The following is a cartoon view of how (some branches of) math research gets published:
  1. Authors write a paper
  2. Authors submit paper to a journal
  3. Authors simultaneously post paper on arxiv
  4. Journal publishes (or doesn't publish) paper
We can contrast this with how life goes in CS land:
  1. Conference announces deadline
  2. One day before deadline, authors write a paper
  3. Conference publishes (or rejects) paper
--------------------------------------------------------------------------------------
For Operations Research, I would say from my observation that it lies on the convex combination of the Math land and the CS land in this perspective, with considerably more weight on the Math side.

samedi, février 27, 2010

Machine Learning in Picasa

I was (re-)trying out the software version of Picasa yesterday and found a new exciting face recognition feature. It's not really that new since it has existed in Picasa for more than a year already. (See here) In contrast to Facebook, where you the user have to pinpoint where are the faces in a picture, Picasa automatically detects the presence of a face and fits it in a rectangle of appropriate size. This technology is actually very common now in point-and-shoot digital cameras, but surprisingly, it seems to be kinda new for photo album software. So this is the first part of the new feature, face detection. It makes sure that only a human-being's face is selected, not a dog's or a monkey's. These two pages [1], [2] contain a survey of some face-detection algorithms.
Now after a face is boxed out in a picture, I'll be able to enter the name of that person, and the identity will be linked to the corresponding contact in my Google address book. The next part of the 'new' feature heavily involves classification tasks: Picasa searches through my entire photo collection on my hard drive, detects the faces (if there are any), and classifies the identities of the faces based on the trained examples, i.e. those faces that I have specified names for. It then lists all the faces that the underlying classification algorithm determines as matches and asks me to label the results as a correct or wrong match. The procedures repeat when new photos are added to my photo library. This is a typical supervised-learning process. What algorithms do they use? Possible: Bayesian approaches, SVM, hybrids, or maybe something completely novel.
An initial tryout shows that the face classification algorithm that Picasa uses is quite impressive in terms of mis-classification rate. With less than 10 trained examples that I provided at the beginning for my parents and myself, Picasa subsequently made identity suggestions of about 300 faces that it detects in my library, and only less than 20 were wrong. There are also some interesting observations for the misclassified faces. For instance, those for my mum all came from her siblings, indicating that they actually look alike. (Of course, they do.)
The computational performance, however, is not quite up to my satisfaction. Since I have a large photo gallery, it took really a long time to perform the above tasks. Furthermore, as the tasks were running in the background, they consumed about 80% of the CPU power, and my cooling fan was making a lot of noise because of the large amount of heat generated. Apparently it's not a battery-friendly or mobile-oriented application. Nevertheless, it's a refreshing feature that Google has implemented here.

samedi, février 13, 2010

jeudi, février 11, 2010

Driver License

Two months after I passed the road test, I received my new driver license in my mail box, finally.
Yes, two months. If you send a parcel from China to the US by sea, it would have already arrived too. So whom to blame? There are only two parties involved in this send-deliver process: the Department of Motor Vehicles and USPS. DMV claimed that they sent out my license shortly after the road test, that is, before Christmas last year. And online delivery status shows that the letter containing my license was returned as 'undeliverable' to DMV by USPS. Hence, it has been sitting at DMV until I called last week. For a moment I was wondering if I had put the address wrong, but wait, my Learner Permit was delivered successfully with exactly the same address. So do tons of other daily junk mail. Determined to find out what has gone wrong, I examined my mailbox in utter details. Oh, here it is: the number tag on my mailbox is missing. However, my mailbox sits along with all the neighboring ones in a row, and apparently the numbers are arranged in order. The one before mine is 4A, and the one after is 4C. And, each apartment DOES have one mailbox. By now, anyone reading this will have concluded correctly that my mailbox is 4B because one thing that differentiates human beings from machines is the ability to inference. (Well, technically speaking, machines nowadays can do some sort of inferencing as well.) And, this is an absolutely EASY inferencing problem. You don't even need the 'fancy' Bayesian rule ... So that leads me to the following conclusion: USPS has switched to such an advanced delivering system that they are now using machines (albeit with bugs) to do the job. Period.

lundi, février 08, 2010

samedi, novembre 28, 2009

Continued

Continuing to swing between Live Space and Blogspot. It's essentially just about choosing between Microsoft and Google. Space is fancy, feature-packed, but slow, while Blogspot is clean, concise, and fast. Pretty accurate reflection of the two companies.