FAQ about the study
Posted on July 6th, 2009 by alessandro
We have drafted an FAQ to address likely questions about the study.
Filed under: Uncategorized
We have drafted an FAQ to address likely questions about the study.
Filed under: Uncategorized
Gentlemen,
The fact that SSN’s can be guessed has been known for sometime and has been part of the base coding behind some highend SSN generators. In the last 4 years I have attend at least 2 lectures detailing how to do this and I myself have detailed how it is done to law enforcement representatives around the world.
I share this because I am wondering why now and why you simply did not rely on the data shown by other researchers.
AA: As noted in the manuscript, the SSN assignment scheme is public knowledge (p. 1). In fact, we do cite previous work in this area by other researchers: said previous work used those patterns to estimate when and where a SSN may have been issued (see p. 1 and [Wessmiller, 2002], [Sweeney, 2004], [EPIC, 2008], cited in the manuscript and in the appendix). Instead, our work focused on the inverse, harder, and much more consequential inference: exploiting the presumptive time and location of SSN issuance to estimate, quite reliably, SSNs. This became possible because:
- We discovered (p. 3) that the interpretation held *outside* the SSA about how Area Numbers are assigned was incorrect: contrary to a commonly held view about their assignment, the same AN is used for 9,999 consecutively assigned SSNs (under the interpretation of the assignment scheme held outside the SSA, the SSA was believed to rotate through all of a state’s ANs for each assigned SN [Crow J, Bennett B (undated) Structure of Social Security Numbers, http://w2.eff.org/Privacy/ID_SSN_fingerprinting/ssn_structure.article.. Such scheme would render the AN random for states with multiple ANs, and the predictions we present in this article dramatically less accurate).
- We discovered (p. 4) that the assignment of the last 4 digits is not only sequential (as indeed stated in the publicly available information about the assignment scheme), but in fact highly correlated with the applicant's date of birth, and therefore not random (note that the SSA states that ``SSNs are assigned randomly by computer within the confines of the area numbers allocated to a particular state'' [SSA, 2001]). This is particularly the case for SSNs assigned after the onset of the EAB (1987 onwards).
- The relationship between Area Numbers and states, while public knowledge, would not be sufficient to predict Area Numbers except in very specific cases (see p. 1): low-population states (such as WY) and certain U.S. possessions are allocated 1 AN each - implying that knowledge that an individual applied for his/her SSN in that state or possession does indeed provide almost certain knowledge of the first 3 digits of his/her SSN. However, other states are allocated *sets* of ANs. For instance, an individual applying from a zipcode within the state of New York may be assigned any of 85 possible first 3 SSN digits. Therefore, knowledge that an individual applied for his/her SSN in that state provides low odds (1 over 85) of correctly guessing his/her first 3 digits with a single random guess. Those odds do not even include the probability of correctly guessing also the Group Number.
In short, without the discovery of patterns linking SSN digits to demographic data, knowledge of the assignment scheme would not be sufficient to predict SSNs with a degree of accuracy necessary to expose them to practical risks of identification. For instance, the probability of correctly guessing the first 5 digits of the SSN of an individual born in NY in 1998, even using the knowledge that the SSN was issued within that state, would be 0.012%, and the probability of correctly guessing the entire 9 digits with fewer than 1,000 attempts would be 0.0012%. Under the algorithm highlighted in the manuscript, those probabilities are several orders of magnitude larger. See Table 6 on p. 27 of the Supporting Information.
I am shocked–but today and the computer will be revealing many things and items. They will cause us embarassment. I will not be around in15 or 20 years.tha
I do not know what will happen in 15 or 20 years–rhe way technoligy is being developed. I feel bad for our people what will happen in WASHINGTON AND CONGRESS. tHEY MUST GET TOGETHER AND VOTEFOR THE COUNTRY AND THE PEOPLE.GOD KNOWS–HELP?????
What you’ve done here is really innovative. This is a new and interesting result that attacks the system from a different angle. Cool stuff. I imagine we will see more attacks based on these ideas.
It is true that many security professionals understood that SSNs were not secure personal identifiers and were potentially compromisable. And it is known that using the last four SSN digits as an identifier is a dangerous practice and yet that practice continues widely in spite of this knowledge.
I am curious about other applications like guessing state driver’s license numbers. Any additional comments along these lines?
AA: Thank you for your comments. Have you heard about SOUNDEX? it’s related to the patterns in driver licenses. CC# have patterns too, but more complex than SSNs - much much harder to predict from public data.
Yes I am aware of soundex…
Your comment on credit card numbers is interesting because as I think you’ll agree “much harder to predict” != unpredictable.
FYI, I was most recently the Chief Scientist of a company developing fraud prevention technologies originally based on biometrics and related image based techniques. We developed a camera based surveillance platform that could detect known fraudsters in real time and system that could verify presented identity documents also in real time. The company recently decided to lay off most of the research team and focus its efforts on a project that was deemed to be an easier sale…SSN verification.
I personally always believed this was a dumb idea, and I knew that SSNs could not be used for this purpose. Your paper proves the point better than I ever could have.
Thanks,
Peter
Sometimes it’s really that simple, isn’t it? I feel a little stupid for not thinking of this myself/earlier, though.