Tuesday, August 08, 2006

AOL reSearch : Research - 500k User Queries Sampled Over 3 Months

AOL just released the logs of all searches done by 500,000 of their users over the course of three months earlier this year. That means that if you happened to be randomly chosen as one of these users, everything you searched for from March to May (2006) is now public information on the internet.

...The data is "anonymized", which to AOL means that each screenname was replaced with a unique number. "It is still a research question how much information needs to be anonymized to protect users," says Abdur from AOL. Here are some examples of what you can find in the data:

Among user 545605's searches are "shore hills park mays landing nj", "frank william sindoni md", "ceramic ashtrays", "transfer money to china", and "capital gains on sale of house"....I'm leaving out the worst of it - searches for names of specific people, addresses, telephone numbers, illegal drugs, and more. There is no question that law enforcement, employers, or friends could figure out who some of these people are....I hope others can find more examples in the data, which is up for download over here (scroll down to the 500Kusers.tgz file).

If you go to the site, there's a person even thanking AOL for this info in comments. We haven't looked at this very closely yet, and haven't talked with AOL. But so far, we're cringing.


For a copy of the file go here:
http://www.gregsadetsky.com/aol-data/

AOL: Releasing Subscribers' Searches a 'Screw up'

Source:
http://www.marketingvox.com

AOL published online, and quickly took down, data on the internet search terms of more than 650,000 subscribers.

Intended as a gesture to researchers, the data that AOL released Monday was of searches entered over a three-month period, writes the Associated Press. Though AOL had substituted number for searchers' names, many of the queries contained information that could be used to deduce the searcher's identity. AOL admitted that releasing the data was a privacy breach and that publishing it was a mistake.

The "mistake" was first noticed by Adam D'Angelo - and news of it almost immediately spread throughout the blogosphere, as did copies of the file that AOL posted, though it has removed the original.

"This was a screw up, and we're angry and upset about it," AOL spokesman Andrew Weinstein is quoted by the AP as saying. "It was an innocent enough attempt to reach out to the academic community with new research tools, but it was obviously not appropriately vetted, and if it had been, it would have been stopped in an instant."

Ari Schwartz, deputy director of the technology watchdog group Center for Democracy and Technology, is quoted as saying search engines should re-evaluate why they even retain such data.

AOL Releases Search Logs from 500,000 Users

(via ugcs.caltech.edu) – AOL just released the logs of all searches done by 500,000 of their users over the course of three months earlier this year. That means that if you happened to be randomly chosen as one of these users, everything you searched for from March to May (2006) is now public information on the internet.