Sunday, February 10, 2013

Password Policy: You Are Doing It Wrong (When 2^56 Becomes 2^42)

They say the road to hell is paved with good intentions. This is often the case with non-standard password policies. About a month ago I visited my "favorite airplane company" website, and after successfully logging with my Frequent Flyer credentials, I've been redirected to an Update Password page where I've been asked to change my password according to the following criteria:

Please insert an 8 characters password
The 4 first characters need to include at least 2 letters (A-Z)
The last 4 characters must be all digits

At first sight this may seem like a good password policy, 8 characters long password, must include at least 2 A-Z letters, must include at least 4 digits. But is it really going to result in a strong password?

The answer is no, and to understand why, it is necessary to understand how brute-force attack works. Brute-force attack consists of systematically trying all possible passwords until the correct password is found. In the worst case, this would involve traversing the entire search space. Now, the more search space there is, the longer (run time) it will take the brute-force to cover it. This doesn't guarantee that a given password won't be the 1st or 2nd option in the search space, but statistically speaking, if there are more options - then there are more passwords combinations to check for.

The password policy defines the search space, depending on the password policy it can either define a global search space (e.g. 8 ASCII characters password) or an individual search space per character (e.g. 8 ASCII characters password, first character must be a digit). The latter is weaker than the former. To demonstrate this, I have developed a small Python script called alphapasswd.py that calculates the search space of a password policy given it's formation and a working sample password.
#!/usr/bin/env  python

# Copyright (C) 2013 Itzik Kotler <ik@ikotler.org>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

import sys
import re
import math


def main():

    try:

        print 'Evaluating Password Formation: "%s" with Sample Password "%s"' % (sys.argv[1], sys.argv[2])

        formation = re.compile(sys.argv[1])

        if not formation.match(sys.argv[2]):

            print 'Sample Password "%s" does not match Password Formation "%s"' % (sys.argv[2], sys.argv[1])

            return 0

        sample_passwd = list(formation.match(sys.argv[2]).group(0))

        print "INPUT: %s" % sample_passwd

        values_per_col = []

        exponent = 0

        # Itereate each Character in Sample Password

        for col_idx in xrange(0, len(sample_passwd)):

            total_values_per_col = 0

            # Itereate 2^8 Values

            for byte in xrange(0, 255):

                old_value = sample_passwd[col_idx]

                sample_passwd[col_idx] = chr(byte)

                # GO / NO GO ?

                if formation.match(''.join(sample_passwd)):

                    total_values_per_col = total_values_per_col + 1

                sample_passwd[col_idx] = old_value

            values_per_col.append(total_values_per_col)

        for col_idx in xrange(0, len(values_per_col)):

            print "PASSWORD BYTE #%d SEARCH SPACE 2^%d (%d)" % (col_idx+1, math.ceil(math.log(values_per_col[col_idx], 2)), values_per_col[col_idx])

            exponent = exponent + math.ceil(math.log(values_per_col[col_idx], 2))

        print "EXPONENT = %d" % exponent

        print "TOTAL = %d = 2^%d" % (2**exponent, exponent)

    except IndexError as e:

        print 'Missing password formation or sample password'
        print 'e.g. %s "[a-zA-Z0-9]{4}" "abcd"' % sys.argv[0]
        print 'Usage: %s <password formation> <sample password>' % sys.argv[0]


if __name__ == "__main__":
    main()
What the script above does is calculate how many bits (as eventually the password will be stored digitally and bit is the smallest unit of measurement used for information storage in computers) are needed to represent each character in the password given the password policy, and then it sums all the bits and outputs the maximum password strength (in bits) that this password policy can yield.

Going back to our original question, now that we have alphapasswd.py, it is possible to compare between two or more password policies and see which yields a better theoretical password (remember this is not testing against common passwords or obvious mistakes, just testing the search space). The second password policy that I will be using for comparecent is very similar to the one in question, but simpler, it's an 8 ASCII characters long password with no restrictions policy. Now that we have competitors, let's start measuring their search space, starting with the airplane company password policy:
./alphapasswd.py "[a-zA-Z][a-zA-Z][a-zA-Z0-9\!\@\#\$\%\^\&\*\(\)]{2}[0-9]{4}" "abcd1234"
The output should be:
Evaluating Password Formation: "[a-zA-Z][a-zA-Z][a-zA-Z0-9\!\@\#$\%\^\&\*\(\)]{2}[0-9]{4}" with Sample Password "abcd1234"
INPUT: ['a', 'b', 'c', 'd', '1', '2', '3', '4']
PASSWORD BYTE #1 SEARCH SPACE 2^6 (52)
PASSWORD BYTE #2 SEARCH SPACE 2^6 (52)
PASSWORD BYTE #3 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #4 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #5 SEARCH SPACE 2^4 (10)
PASSWORD BYTE #6 SEARCH SPACE 2^4 (10)
PASSWORD BYTE #7 SEARCH SPACE 2^4 (10)
PASSWORD BYTE #8 SEARCH SPACE 2^4 (10)
EXPONENT = 42
TOTAL = 4398046511104 = 2^42
From this output it is possible to see that from an 8 characters long password, the #1, #2, #5, #6, #7, and #8 bytes have a smaller search space (generally speaking the upper bounds of an ASCII byte search space is 2^7), and as a result, the maximum search space is 2^42. Now, let's try the second password policy (i.e. 8 ASCII characters long password with no restrictions):
./alphapasswd.py "[a-zA-Z0-9\!\@\#\$\%\^\&\*\(\)]{8}" "abcd1234"
The output should be:
Evaluating Password Formation: "[a-zA-Z0-9\!\@\#$\%\^\&\*\(\)]{8}" with Sample Password "abcd1234"
INPUT: ['a', 'b', 'c', 'd', '1', '2', '3', '4']
PASSWORD BYTE #1 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #2 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #3 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #4 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #5 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #6 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #7 SEARCH SPACE 2^7 (72)
PASSWORD BYTE #8 SEARCH SPACE 2^7 (72)
EXPONENT = 56
TOTAL = 72057594037927936 = 2^56
From this output it is possible to see that from an 8 characters long password, all the bytes have the maximum search space possible given the upper bounds of an ASCII byte search space (i.e. 2^7), and as a result, the maximum search space is 2^56. In other words, the first password policy is 16384 (i.e. 72057594037927936/4398046511104) times weaker than the second password policy. Reviewing the first password policy again, it's obiovus that the 4 digits requirement is what limits the search space the most. If so, why did my "favorite airplane company" request it? On the same Update Password page it says (on the bottom) that:

The last four characters in your password (the digits) will serve as your secret code to identify yourself at the Telephone Service Center

And so the mystery is solved, due to a legacy IVR (Interactive Voice Response), and the fact that my "favorite airplane company" did not want to seperate between their Website and IVR credentials, they composed a password security policy that is in fact weaker than an any 8 ASCII characters long password policy. Now, come to think about it, if I only need to enter 4 digits to log-in in the IVR, how are they storing the passwords then? It can't be hashed and compared as the IVR will only accept 4 digits, while the password is 8 characters long? Oh well, I guess that's a story for another day.