This week it was revealed that the iPhone stores users’ locations, and this immediately caused a huge firestorm of commentary by tech geeks, panic among privacy advocates, and delight to data geeks like myself. Even better/worse, it seems that the iPhone caches location traces long-term, possibly back to the date the phone was activated.
I ditched my iPhone this past December (good riddance) in favor of the Droid X (Android). I figured, on such an open source OS, Google must be doing the same thing. After surfing through Hacker News, it turns out I was right.
Compared to the iPhone though, getting the data on an Android phone is not simple.
- The data is stored in two files, cache.cell and cache.wifi in the directory /data/data/com.google.android.location/files.
- First, the user cannot browse this directory by attaching it to a computer. I installed an SSH daemon QuickSSHD to allow remote access into my phone.
- Second, it is not possible to access this directory without getting a Permission denied error, even if logged in as “root” as Google has not made this directory readable.
- Finally, for those (myself) that are still determined to crack this nut, you will need to root your phone. This makes the “root” user a real superuser that has near complete control over the phone.
Once I downloaded the files to my Mac (via scp), I downloaded this handy-dandy parser from packetlss called android-locdump and converted the cache.cell and cache.wifi files into GPX files by passing the --gpx flag. You can also leave off the --gpx flag and parse the output yourself.
Then I used GPSBabel to convert the GPX files to CSV files and loaded them into R. While this was great for a static view, the lack of interactive zooming makes working with this type of data more difficult. I then used some code from the RgoogleMaps package vignette, and adapted for use by Michael Malecki. [Drew Conway has developed stalkR for analyzing iPhone and iPad location data in R.]
library(RgoogleMaps) Df <- read.csv("CSV file", header=FALSE) names(Df) <- c("Latitude", "Longitude", "Key") bb <- qbbox(lat=range(Df$Latitude), lon=range(Df$Longitude)) m <- c(mean(Df$Latitude), mean(Df$Longitude)) zoom <- min(MaxZoom(latrange=bb$latR,lonrange=bb$lonR)) Map <- GetMap.bbox(bb$lonR, bb$latR, zoom=zoom, maptype="mobile", NEWMAP=TRUE, destfile="tempmap.jpg", RETURNIMAGE=TRUE, GRAYSCALE=TRUE) tmp <- PlotOnStaticMap(lat=Df$Latitude, lon=Df$Longitude, cex=.7,pch=20,col="red", MyMap=Map, NEWMAP=FALSE)
The map clusters my activity into a few familiar categories: work, school (Math Sciences Building actually), home, and my parents’. Android also picked up a dinner outing in Santa Monica, and a trip to the Shopzilla office for the Los Angeles Hadoop User Group meetup, but little else.
What I Found
The cache.cell file uses cell tower triangulation to locate the user. In addition to this imprecise measure, the Android’s location tracker has several limitations
- It seems that location is recorded infrequently. I had expected to see trails of activity corresponding to walking or driving. All of my activity is clustered in areas where I am mostly likely stopped (on campus, at work, at home, in Santa Monica, and at the intersection of Gayley and Wilshire which has an excruciatingly painful wait). The iPhone location history seems to be much more complete/useful.
- According to the old Android source, only the last 50 cell locations, and last 200 WiFi locations are recorded (boring). My phone seemed to record more than 50 cell locations (approximately 200), but this is small.
- I couldn’t even convert the cache.wifi file because it was apparently empty. This file is apparently cleared when WiFi is disabled.
I also found that I need to get out more.
Why Would Apple do Such a Thing?
Earlier iPhone models (up to 2010 apparently) used Skyhook for its geo-location database. Skyhook employees basically drive cars wired with WiFi sensors and GPS and does what is called “wardriving.” They drive around cities recording information about the access points it encounters and where it encounters them. When a user logs onto the web via one of those access points, Skyhook customer sites can cross-reference the access point location with its physical location. As of August 2010, Apple dropped Skyhook. Why?
I suspect Apple is using this data to build its own geo-location database, yet there is no evidence that the files on the iPhone are actually being transmitted to Apple. If it is true that the location database is actually transmitted to the user’s computer, it’s possible that Apple uses this data from Safari to enable geo-location features in it.
The investigative side of me says that this could be useful in a missing persons case if the phone is dropped.
Android or iPhone?
Apple and Google pursued different approaches in caching users’ locations. Apple used a standard database file stored on the phone. Although this file is hidden in the phone, it seems to be transmitted to the user’s computer. The user can then open the file and see what Apple is storing about them. Heck, they could even modify it to privatize it. The iPhone updates this information very frequently, and keeps it around for a very long time. The file is there, the user knows it is there, and the user can see what is in the file. Unfortunately, this also means that people will overreact.
Google, on the other hand, hid the file deep in the filesystem such that a terminal connection is necessary to reach it, and “rooting” the phone is necessary to see its content. The user has no idea that this file exists, and cannot see what Google is storing about them. This is a bit shady. On the other hand, the information that Google is collecting is very minimal and has questionable use. Data is not updated often, and is not held on disk for very long. It is also possible to clear at least the WiFi location cache file by turning WiFi off and on.
So, what do you think about all of this?
One point that almost nobody mentioned is that Android location tracking is associated with a unique phone identifier (sent to Google) :
“In the case of Google, according to new research by security analyst Samy Kamkar, an HTC Android phone collected its location every few seconds and transmitted the data to Google at least several times an hour. It also transmitted the name, location and signal strength of any nearby Wi-Fi networks, as well as a unique phone identifier.”
http://online.wsj.com/article/SB10001424052748703983704576277101723453610.html?mod=WSJEurope_hpp_LEFTTopStories
Do we have more information on that (is it the Android ID, other ID) ?
It could be pretty disturbing
This is very similar to what Skyhook Wireless does to create their WiFi location database, however, Skyhook uses its own devices. Google has a wide user base it can use to acquire data to improve their products. Whatever this unique identifier is, it’s similar to the UUID on the iPhone. Different device, same privacy concerns.
A natural unique ID for the Android (and the iPhone) would be the MAC address of the WiFi chip, or a hashed version. This is easily tied to your Gmail as is required by activating the device. Also, I am sure Verizon also has this information. To prevent that, they may hash the MAC address, but that would be done internally.
I wouldn’t characterize storing the location dump file out of reach as ‘shady’ but rather as the proper conduct.
Opinions will vary on this. Only reason I consider it shady is because it is so hidden that it is easy for the user to assume this data is not being recorded. A lot of these privacy advocates prefer that the data be easily available to the user to see (and in many cases moderate). I support Google collecting whatever they want as it improves their products, but on a personal device, I don’t think it should be hidden.
[…] your movements as much as possible because it, at its heart, is an advertising company. For it, Android is about getting advertising to eyeballs pointed at a mobile […]
code for mobile phone tracking in perticuler area in android