This project has moved and is read-only. For the latest updates, please go here.
20
Vote

UniCode 8.0 basis for VeraCrypt

description

At present VeraCrypt is based upon the ASCII character set, an unfortunate limitation inherited from TrueCrypt. Rebasing VeraCrypt on the upcoming Unicode 8.0 standard (to be released June 2015) or even the older Unicode 7.0 standard (June 2014) would bring huge benefits to the user community:
  • The user base for VeraCrypt would no longer be limited to languages based on the Roman alphabet. This would enormously expand the user base since a large percentage of the world's population uses languages based on non-Roman alphabets (Chinese, Japanese, etc.).
  • The security strength of VeraCrypt would be greatly improved. Attackers will no longer be able to assume that the password will not contain characters from obscure languages. Brute-forcing passwords will thus become exceedingly difficult. Users will be able to create vastly stronger passwords simply by incorporating words or characters from non-Roman languages.
I therefore request that VeraCrypt be promptly transitioned from ASCII to Unicode 8.0.

Unicode versions are described here: http://en.wikipedia.org/wiki/Unicode#Versions

comments

L0ck wrote Jan 10, 2015 at 5:42 PM

Voted :)

destrukt wrote Jan 11, 2015 at 3:55 PM

Not a bad idea at all, voted :)

commenter8 wrote Jan 12, 2015 at 1:53 AM

More specifically, I suggest VeraCrypt be based on the 65,536-character "Basic Multilingual Plane" as defined here: https://en.wikipedia.org/wiki/Plane_(Unicode). Since the current ASCII base contains only 128 characters, this results in 512 times as many possibilities per character position. VeraCrypt currently recommends passphrases of at least 20 characters; that corresponds to 1.4 x 1042 possibilities. Thus a UniCode BMP passphrase of only 9 characters (2.23 x 1043 possibilities) is 16 times stronger than a 20-character ASCII passphrase. A 20-character UniCode BMP passphrase contains 2.14 x 1096 possibilities, which is 1.53 x 1054 times the strength of a 20-character ASCII passphrase.

commenter8 wrote Jan 12, 2015 at 10:52 AM

International Components for UniCode - http://site.icu-project.org/

ICU is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.

ICU is released under a nonrestrictive open source license that is suitable for use with both commercial software and with other open source or free software.

algreider8 wrote Jan 16, 2015 at 4:19 PM

This is an awesome idea, commenter8! Strange I never thought about it before.
I am sorry that I can not vote + 10 :D

idrassi wrote Jan 16, 2015 at 4:23 PM

Thanks for all these interesting information.
Indeed supporting Unicode passwords will add more strength but in practice how many users will use the full Unicode power? In Western countries almost none. It will most certainly apply to other parts of the world like Asia.

From the technical point of view, my experience tells me that the only Unicode set that can guarantee a unique representation of the password across all platforms is UTF-8. As such, VeraCrypt would use UTF-8 encoding for the internal password across all platforms.
On the user interface side, it should be easy to acquire the password on Unicode and transform it to UTF-8. The only point is that the internal password size limitation (currently 64 bytes) will not always translate to the size of the user password as a single Unicode character can br represented using 2 or more UTF-8 bytes. So, some users who enter a password with full Unicode characters (like in Chinese) may be confused to have their password limited to 30 characters instead of the announced 64.

Last point: in thé context of pre-boot authentication, Unicode passwords can't be supported as in this case, the keyboard is in US layout. The bootloader has limited capabilities on handling international keyboards and for that support we need a mini-OS that implements all keyboard layouts (like in Linux). There are no plans to have such mini-OS and so the Unicode passwords can't be supported in this case.

commenter8 wrote Jan 16, 2015 at 5:30 PM

Regarding the use of UniCode characters by Western users, this can be increased as described in the issue "Guide users toward high-strength passwords" at https://veracrypt.codeplex.com/workitem/69

UTF-8 is one of the three standard UniCode formats and is most prevalent in Web applications, so it is a good choice for UniCode representation.

Regarding the internal password size limitation, this can be increased as described in the issue "The ceiling of 64 characters - break it!" at https://veracrypt.codeplex.com/workitem/71

Regarding the pre-boot authentication, it would be OK to leave that as a special exception until such time as VeraCrypt can incorporate a Linux-type mini-OS which is capable of implementing all keyboard layouts. This will obviously be a feature considered very desirable by non-Western users, and I anticipate that demand for this feature will be very strong in the future as VeraCrypt gains international popularity,

commenter8 wrote Mar 9, 2015 at 5:53 PM

During this discussion http://veracrypt.codeplex.com/discussions/584769
Mounir advises that a 32-bit bootloader per this issue https://veracrypt.codeplex.com/workitem/27
is sufficient to incorporate a Linux-type mini-OS in the bootloader which is capable of implementing Unicode and all international keyboard layouts as requested by this issue http://veracrypt.codeplex.com/workitem/62 (UniCode 8.0 basis for VeraCrypt), and that although it (the 32-bit bootloader) cannot be implemented quickly it is "definitely one of the top priorities."

idrassi wrote Nov 26, 2015 at 12:54 AM

I finally finished the implementation of support for UNICODE passwords on Windows. It required a huge rewrite of the code to support UNICODE everywhere in order to have a coherent code: https://veracrypt.codeplex.com/SourceControl/changeset/cd9c94ebf9492677e290a808c02ef9fd5be39225

Of course, this works for non-system encryption since the bootloader is still 16-bit and no UNICODE support is possible yet.

I have uploaded an installer for 1.17-BETA that includes this. Tests are welcomed since this is a big change.
https://sourceforge.net/projects/veracrypt/files/VeraCrypt%20Nightly%20Builds/

HawkTroy wrote Feb 5, 2016 at 10:12 PM

WOW...
I'm Chinese, and I haven't seen ANY Chinese platform using unicode for password. I am pleasantly surprised that the developers' actually implemented unicode support.

There is some risks though...
To input Chinese characters on a QWERTY keyboard, an input method (a relatively low level system utility) is required. For an input method to be efficient, it has to learn to adapt to users' input behavior, collecting user data in the process. If a user typed a Chinese passphrase several times, the passphrase is bound to remembered by the input method and stored in a database. Even worse, many input methods would offer to upload this database to the cloud and help synchronize / personalize input method behavior across devices.
These features are enabled by default, and not always possible to disable. Even if they could be disabled, I doubt many users would take the effort to do it.

As for myself, I would rather use a longer English passphrase (which would be easy to remember by mere muscle memory) than take the risk of leaking an important passphrase through an input method.