Just noodling on this …
As I understand, passwords are generally constructed from the 94 printable ASCII characters that do not include the space character, which leaves:
- Uppercase Latin letters (26): ABCDEFGHIJKLMNOPQRSTUVWXYZ
- Lowercase Latin letters (26): abcdefghijklmnopqrstuvwxyz
- Arabic numerals (10): 0123456789
- Symbols (32): !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
However, I’ve often seen restrictions on the symbols that can be used in a password. For simplicity, I think that it is reasonable to expect no more than 10 symbols to be eligible in any particular password. Therefore each character of a password is probably one of the remaining (26+26+10+10=) 72.
And therefore, assuming that the selection of characters used in a password is random, the search space for a password of length n is n⁷² 72n and the entropy (in bits) is log₂ n⁷² log27220.
Thus, a 20 character password has an entropy of log₂ n⁷² log27220 or 311 123, assuming that the characters are chosen randomly.
But, of course, humans don’t usually pick characters randomly so this entropy calculation may not be very relevant in the real world.
Suppose, as suggested above, that this 20-character password is instead made up of real English words that a person knows (and can remember how to spell) delimited by either digits or symbols. Since my knowledge of combinatorials is poor, let me restrict the space to:
- 4-letter words chosen randomly from a pool of 3,000 English words that a person knows and can spell.
- Each character of each such word can be either lowercase or uppercase, increasing the pool from 3,000 to 3,000 x 2⁴ = 48,000.
- Either one of ten symbols or one of the ten digits is placed in front of each such mixed case word.
In which case (I think) the entropy is:
log₂ (48000⁴ x 20⁴ ) = 79
Let’s look at some history of strictly random attacks. In 1997 a 56-bit DES key was broken in just 96 days using a distributed system of off-the-shelf personal computers. In 1998 another distributed system did it in only 39 days. In 1998 the Electronic Frontier Foundation (EFF) constructed the DES Cracker (consisting of custom chips and at a total cost of less than $250,000) broke one in only 56 hours of work after checking only 25% of the key space.
So, what amount of entropy is sufficient? RFC 4086 sites a 1996 paper titled, “Minimal Key Lengths for Symmetric Ciphers to Provide Adequate Commercial Security,” and says that
“It concluded that a reasonable key length in 1995 for very high security is in the range of 75 to 90 bits and, since the cost of cryptography does not vary much with the key size, it recommends 90 bits. To update these recommendations, just add 2/3 of a bit per year for Moore’s law. This translates to a determination, in the year 2004, a reasonable key length is in the 81- to 96-bit range.”
It’s now 2022 so (if I’m doing the arithmetic correctly) the incremental entropy is now:
(2022 - 2004) x ⅔ = 12
Which implies that the entropy required today “for very high security” against a brute force, random attack is in the range of 93 -108. (Yes, obviously key stretching algorithms such as PBKDF2, bcrypt, scrypt, and Argon2 can increase the security conferred by any level of entropy.)
But, as I understand, attackers do NOT guess passwords randomly. I’ve read that they guess in this order:
- Entire passwords frequently found in breached password files.
- Strings of varying length consisting of all lowercase characters.
- Strings of varying length consisting of a combination of uppercase and lowercase characters.
- Strings of varying length of characters drawn from the pool of all 94 characters.
If so, this implies that we should choose passwords in which uppercase, digits, and symbols are overweighted.
No doubt there are many errors in these thoughts but perhaps it provides a framework for evaluating different strategies for selecting secure passwords.
UPDATE January 5, 2023 5:12 PM
I corrected my exponentiation error; I had mistakenly transposed the base and the exponent.