Recently, I have been thinking about alternate ways of specifying search queries other than with text. A couple of weeks ago I came across a piece of music that I could not identify. I thought it would be a huge win for a search engine to allow me to upload this piece, and it would present me with matches, or near matches to other pieces that sound similar, or have similar characteristics. Some services already exist. Shazam allows a user to place a microphone near playing music and it will identify the artist and song. Some uses of search-by-sound:
- Music identification (“solved” – Shazam)
- Music personalizaton and recommendation (“solved” – Pandora)
- Identification of the source of a sound (i.e. a species of bird, a musical instrument, an inanimate object)
- MP3 and media file search
- Finding material that violates copyright
As our motivating example, consider we find some really cool graphic on the web and we want to know where it likely originated (i.e. art, a meme). In such a search engine, we could upload the graphic and get results containing the exact image, or images that are very similar such as variations of the image (crop, resize, borders, different effects), modifications of the image (consider Obama-izing [the campaign logo] someone’s picture), and semantically similar images (different photos of the same object or person). Wouldn’t this be cool? A billion-dollar idea, right?
Well, Google apparently beat me (and millions of others I’m sure) to it with its search-by-image feature on Google Images. I uploaded a photo of myself to see what I would get. We see my school website (where the image originated), as well as several other sites that use my Gravatar. Not too bad.
On the results page, users can also provide some type of labeled data to Google. I am not exactly sure what it is used for yet, but note the text in the search bar: “Describe this image.” Upon entering my name, Google found another photo that looks almost identical to the first one — a variation.
Below are the “visually related” images that were presented to me (before I labeled my photo in the search bar):
I see Steve Jobs (I am honored), but 7 out of 16 images are women, and of the men, we look nothing alike. I know, I know, “visually related” refers to similarity in pixels between images, but I expected more. In these images, we see a lot of red and blue hues.
Let’s try something that will generate many more hits: a popular meme…
The image I uploaded was originally posted on Amazon S3, and is linked to by the above two web pages. Google does a much better job when using a URL rather than uploading an image for obvious reasons. More interestingly, the “visually similar” images show variations and modifications of the same image, based on pixel similarity.
And we get also see web pages containing a copy of the image (not linked to the original S3 file):
But this Isn’t Good Enough Yet!
Google “Search-by-Image” is an awesome first step, and I look forward to seeing more as it is undoubtedly coming. For search-by-image (or search-by-multimedia) to be useful, it must also take “semantic” or conceptual knowledge into account, just like with text search. That is, if I upload a photo of myself, I should get back other photos of myself from various (hopefully authorized) sources. Or, if we upload a photo of the Eiffel Tower, we should get back integrated search results containing other images of the Eiffel Tower as well as text results with information about the Eiffel Tower, and perhaps a tourist’s video or documentary.
One may at first believe that the O RLY search used some semantic knowledge; however, all of the images share a large number of pixels and these images are likely just “visually similar” as stated. Using semantic knowledge, one may see results of other famous owls used in memes in addition to the variations and modifications of the O RLY owl.
All of the data collected by such a system would also provide a hell of a corpus for image and multimedia classification. Researchers could construct classifiers for detecting spammy multimedia, knockoff multimedia (second, third generation grain in images, waveform distortion in audio), pornographic content, as well as augmenting labeled and unlabeled multimedia with metadata. For example, suppose we take a picture of what I think is a rhodendron (inside joke for readers). With such a large corpus, I can upload the photo and have Google (or some other AwesomeSearch) retag the image as that of a hydrangea instead.
Uses of search-by-multimedia with semantic knowledge:
- Cross referencing objects or people on various different sites.
- Product search when textual information (or QR code) is not known
- Catching criminals
- Cataloging media
- Methods for multimedia spam detection
- Geolocation without use of GPS or WiFi, and location search
- Augmentation of metadata and tagging of objects, people, etc.
- Detecting adult, inappropriate or illegal content.
- Identification of actions from images, video or audio and retrieval of related information
Of course, search-by-multimedia poses the same challenges that we face in big data today:
- choosing and boosting the proper features
- collecting a significant and correctly labeled corpus
- fast processing of large datasets with new and existing machine learning algorithms
- efficient indexing and retrieval algorithms to match queries with probably results
- these things are easier said than done, but a lot of fun.
Search-by-multimedia is a very interesting concept and is exciting to think about. In this age of big data and technology, anything will possible. I look forward to the day where anything on the Internet can be found, no matter its content or medium.
To check out Google’s search-by-image, click here and then click on the camera icon.
Leave a Reply