On the “Anonymity” of the Facebook Dataset (Updated)

michaelzimmer.org:

(Updated below with responses to comments by Jason Kaufman, one of the lead researchers on this project)

(Another update: I’m pretty sure the “anonymous, Northeastern university” from where this dataset was derived is Harvard College. Details here)

A group of researchers have released a dataset of Facebook profile information from a group of college students for research purposes, which I know a lot of people will find quite valuable. (Thanks to Fred Stutzman for bringing it to my attention.)

Here is the description from the Berkman Center’s announcement:

The dataset comprises machine-readable files of virtually all the information posted on approximately 1,700 FB profiles by an entire cohort of students at an anonymous, northeastern American university. Profiles were sampled at one-year intervals, beginning in 2006. This first wave covers first-year profiles, and three additional waves of data will be added over time, one for each year of the cohort’s college career.

Though friendships outside the cohort are not part of the data, this snapshot of an entire class over its four years in college, including supplementary information about where students lived on campus, makes it possible to pose diverse questions about the relationships between social networks, online and offline.

Access to the dataset requires the submission of a research statement (which I haven’t yet done), but the codebook is publicly-available.

Of course, this sounds like an AOL-search-data-release-style privacy disaster waiting to happen. Recognizing this, the researchers detail some of the steps they’ve taken to try to protect the privacy of the subjects, including:

  • All identifying information was deleted or encoded immediately after the data were downloaded.
  • The roster of student names and identification numbers is maintained on a secure local server accessible only by the authors of this study. This roster will be destroyed immediately after the last wave of data is processed.
  • The complete set of cultural taste…

Other News