This project has moved and is read-only. For the latest updates, please go here.

VeraCrypt recovery bug - an analysis of what went wrong and recommendations

Topics: Feature Requests, Technical Issues
Nov 29, 2015 at 3:45 AM
Edited Nov 30, 2015 at 1:24 AM
I have just had to jump through some major hoops to get access to any of the data on my VeraCrypt system-encrypted drive after a failed Windows update. For a while I thought I'd lost all my laptop data, which would have been a serious crisis. As it is, I probably "only" have to rebuild the OS from scratch.

A bug in Veracrypt, along with mistakes I made and weaknesses in Veracrypt's recovery model all contributed to the problem. I am going to write it all up here in the hopes it helps the developers, and in the hopes that someone has advice on better way forward than what I'm looking at.

Original Issue: Windows 7 failed to boot, issuing a STOP 0x0000006B, after applying a largish set of updates.

First attempt to fix: Boot into a LinuxMint live DVD, use VeraCrypt to mount the drive and perform the bootcache fix that is standard for this problem. This process went smoothly, but it did not fix the boot issue. At this point I feared that something in the Windows updates caused the VeraCrypt driver to top loading, so I resolved on fully decrypting my drive before proceeding any further. This exposed the first, and one of the most serious flaws in VeraCrypt's recovery model that being:

Weakness #1: VeraCrypt has no method of in-place decrypting a pre-boot encrypted drive from within Linux. This isn't a technically challenging feature to add, and, frankly, I'm astonished it's not possible.

Second attempt to fix: This left as the only method of decrypting being to use the VeraCrypt recovery CD. Frustrating and dangerous. This is frustrating, since it was aiming to take two weeks to complete and I really need my laptop. Dangerous because it's the beginning of winter and I live in rural Nova Scotia. The odds of a power failure occurring that exceeded my laptop's battery time are high enough to make me nervous, but I...

My first mistake: ... proceed. I should have used Linux to take a dd image of the entire partition before I did anything. Big mistake. I simply started the recovery CD decryption. The decrypt process says to "Press ESC" if I need to interrupt. After running for about 4 days (1/3rd of the way through decrypting) I had to stop it, so I followed the instructions and pressed ESC. That's when I ran into the recovery bug:

Recovery bug: The drive was about 1/3rd decrypted when I pressed ESC. As soon as I pressed it, the decryption stopped, but nothing else displayed, and nothing else happened. Pressing escape again merely caused the computer to beep. I didn't know what was supposed to happen, so I waited several minutes. When nothing more happened I assumed it did what it was supposed to and shut down the laptop. I found later that something went wrong and the decryption point wasn't saved. That meant about a third of my drive was decrypted and that VeraCrypt didn't save the location where the diving line between encrypted and unencryped data was. I didn't fnd out that this occurred, though, until I tried to restart the decryption using the VeraCrypt recovery disk again. When I did restart decryption, I saw that VeraCrypt started over right from the beginning! When I saw that, I stopped it immediately. However the damage was already done, and several already decrypted megabytes are now decrypted again. I believe with Serpent that decryption and encryption are the same process, so that means they are reencrypted. My drive is now like this:
EEEEEEEEEEEEEEEEEEEDDDDDDDDDDDDDE
-------------------------------^
E is Encrypted, D is Decrypted, and the ^ marks where VeraCrypt thinks it should resume decryption, and if I let it finish the second time, it will be like this:
DDDDDDDDDDDDDDDDDDDEEEEEEEEEEEEEE
With the dividing line unknown precisely.

Weakness #2: Here is where I ran into VeraCrypt's second weakness. That is, that nothing will mount a system encryption drive that is partially decrypted. It just refuses. This is immensely frustrating, because there is now nothing that will mount this drive. Even if I fixed my original blue screen problem I can't boot it, because a third of the drive is decrypted with the pointer in the wrong spot and the VeraCrypt driver would make a mash out of the data on the back third of my drive. I can't mount it in Linux to even try and read what data I can get off it, because VeraCrypt in Linux cheerfully tells me that decryption isn't finished and it doesn't support that. This is also astonishing, because giving VeraCrypt the ability to read a partially encrypted system drive is even more trivially easy than decrypting them. And I mean that literally. The code that's in there now that throws the exception was just as much work if not actually a little more work to implement than simply supporting a partially encrypted drive would have been. Both require testing against the GetEncryptedAreaLength() result, but the code in there now requires throwing the exception and the language translation for the error message.

So I can't mount it. I also can't manually change the encrypted area pointer to make the volume mountable because...

Weakness #3: ...because of the project's persistent attachment to the "feature" of plausible deniability, everything and I mean every bit of configuration data for an encrypted partition is inside some sort of encrypted key scope. That means that the marker that tells VeraCrypt where the divide is between encrypted and unencrypted space is, itself, encrypted. This is a huge hindrance to recovery efforts, because there is now literally nothing I can do within VeraCrypt as written to get at my data. I can't edit the pointer, because it's inside encrypted space. I can't mount the data, because VeraCrypt refuses.

Closing the barn door...: When I realized the first decryption failed to mark its location, that's when I finally took an image of the drive. It's a bit like closing the barn door after the cows are out, but at least I can't hurt it any worse now.

Where I am now:_ Since VeraCrypt as written won't work to get at the data, that meant I needed to work on VeraCrypt to make it more useful for recovery. Either that or kiss my data goodbye. So I built a LinuxMint mini distro out of an 8 gig thumb drive with the live CD on it and a persistent data storage area. It's still a shadow of a useful workstation, but I don't have any other computers so I have to work with what I have. I have used this to create a little VeraCrypt build system and have patched VeraCrypt it so that it works, read only at least, with partially encrypted drives. I have mounted the drive and am now salvaging files off it. I don't know where exactly to put the pointer between encrypted and unencrypted space, but it's a work in progress.

Way Forward: I still hope to salvage the OS. For that I need to remove system encryption. I'm never using the "recovery" CD's decryptor again, and recommend no one else use it either. Ever. It's slow and not even useful in a last-ditch emergency capacity. I'm hacking together a simple command-line tool to decrypt a system volume in place from inside Linux. That part's easy. The hard part will be to find the exact dividing line between encrypted and unencrypted data. This is proving difficult. The best NTFS analyzing software exists in Windows, which I can't boot, and going through filesystem structures manually is very tedious. Advice here is gratefully accepted.

Recommendations to end users: If you use system encryption, then:
  1. Take a sector-clone of your VeraCrypt drive often. Buy yourself an identical drive and a cheap hardware cloner from AliExpress. They don't cost much. This isn't the first time I've ran into difficulties with a TrueCrypt/VeraCrypt system-encrypted drive, I just got caught more with my pants down this time than other times. So I assure you, it is not a matter of IF something will go wrong but WHEN. This is not inherently VeraCrypt's fault - it's just an added risk when you are encrypting your whole drive in a way Microsoft doesn't care about supporting. You are always only ever one driver-not-loading away from Windows making a mash of your drive.
  2. Create, if you can, a Windows rescue CD with the VeraCrypt driver on it. You have to do this from within Windows, the systems that make these actually do it using your installed Windows to build it. You can't just download one.
  3. Even if you don't do a sector clone regularly, if (when) you run into any issues with VeraCrypt, then IMMEDIATELY take a sector image of your whole drive before doing anything within VeraCrypt to recover. So that if you start to do something and that fails, you have a baseline to go back to.
Recommendations to developers:
  1. Add in-place decryption to VeraCrypt in every version. For those of us who have to use Windows, Linux is still the first best recovery option. In fact, I would be willing to work with you to make a small Linux live system that could be customized as THE VeraCrypt recovery disk.
  2. Add support for partially encrypted system partitions to VeraCrypt in every version.
  3. Never ever assume that VeraCrypt knows better than the user what the user wants to do. Throwing an exception and stopping the user might seem to be a good idea, but really, there should be an option to override that. In this manner VeraCrypt can be made not only to work in the situations you anticipate, but also in the ones you don't.
  4. At risk of sounding like a broken record, abandon the dubious existing plausible deniability model so that VeraCrypt's headers can be done in a way that makes sense and better supports recovery. Values like the system encryption position pointer, which make no sense to encrypt, should be exposed and able to be edited with a simple sector editor. Plausible deniability doesn't work, but exposing these values has real-world recovery benefits.
Dec 2, 2015 at 1:47 AM
When I wrote the above report, I had not yet examined the raw hard drive data for the sectors that the VeraCrypt rescue CD decrypted the first time. When I did, I realized to my surprise, that the decryption wrote about 220GiB of solid 0xFF. Since this was definitely not what the data was on the drive, I can only assume a bug in the rescue CD decryption.
Dec 3, 2015 at 5:33 PM
Adding my problem with the Disk Rescue CD and this comment from another forum, where I sought for help too:
I've learned that veracrypt is not reliable (anyway it is still unclear to me why the disk did not its job)....a rescue disk should be there to recover (here just to restore the original bootdata) and there should be no backup data on the drive itself for security reasons....(but here it was the only way to get it back)...

I mean OP has written a new bootloader and fixmbr does not overwrite the partitions and the drive might have got some errors due to 'tuneup' crap.

A potential attacker who could get hands on the drive could have been also able to recover the data...the one would just have to reverse the PW call anything else >had been on the drive...

(...)

a separate rescue disk that should hold already what is needed to restore AND a backup of the header (keys etc.) at the end of the drive? A separate disk is safer than 'at the end of the drive (feature where a backup of the header (keys etc.) are stored at the end of the drive, due to risk of certain programs damaging the boot loader at the front of the disk).
When the method you have used is the way like the veracrypt software decrypted the data 'at the end of the drive' by entering your PW then it is a security vulnerability to store them there additionally.
It is actually the job of the original bootloader, not the software.

IMHO the drive should contain NOTHING backup'ed that can be used to restore, it's the job of a separate disk one should take care and keep at another place.
I think that VeraCrypt is not yet ready to completely rely on at the moment. Although, the software is free and I'm not entitled to any support, I wish that the developers had helped with my problem more, since they know their software best. Luckily with the help of the users here, you included Kudalufi, I was able to retrieve my files.
Dec 4, 2015 at 4:33 AM
DesperateGirl wrote:
I think that VeraCrypt is not yet ready to completely rely on at the moment. Although, the software is free and I'm not entitled to any support, I wish that the developers had helped with my problem more, since they know their software best. Luckily with the help of the users here, you included Kudalufi, I was able to retrieve my files.
I am a staunch supporter of strong encryption, and (even now) an advocate for VeraCrypt. I must say, though, finding out that the rescue CD decryption wrote a solid 230GB of blank data to my drive has shaken my confidence in it.

That being said, all mass storage is imperfect, as are all operating systems. It is not a matter of if something will fail, but when. And using whole-disk encryption complicates any issues that might arise. It makes the whole system much more unforgiving of mistakes. So taking backups, regularly, is much more important.

I think VeraVrypt is quite reliable when in normal use. I trust it more than I trust the OS it is protecting - I suspect the original problem I ran into was a Windows update not playing nicely with it. So VeraCrypt didn't cause the issue - it just didn't offer much in the way of tools to help get out of extremis, and the tool it does offer may not be well tested.

I renew my recommendation (in the strongest possible way) that anyone using VeraCrypt for system encryption invest in an identical hard drive and use a drive cloner to make a full sector-by-sector copy every so often.
Dec 4, 2015 at 4:53 AM
Here is some data that may help developers track down the rescue CD decryption bug:

Cipher: Serpent
Drive size: E8 E0DB 6000 bytes
Beginning of bad data (0xff): Byte 0xB1 FC54 5000

Besides being on a 4k boundary, nothing about the number stands out to me.
Dec 4, 2015 at 3:38 PM
Hi,

@DesperateGirl: I think it's a little bit harsh to call a software unreliable just because there is a hardware error and the software can't behave correctly in this case. As indicated in the discussion you mentioned, your SSD drive had a hardware error and the rescue disk is unable to read data from it. How do you want VeraCrypt Rescue Disk to behave in this case? It printfs disk error because there is a disk error...I'm open to any explanation to why you judge that VeraCrypt is unreliable in your case. Please explain.

@Kudalufi: I understand your frustration and you anger, but let's try to be objective and to analyze the situation calmly. Recommending to not use Rescue Disk is a little too much because it is a very important recovery tool and it works if the Rescue Disk is able to read / write data correctly from disk. In case where the disk itself is malfunctioning, then there is no guarantee that the process will succeed.

Going back to you report, the problem came from the fact that you didn't wait for VeraCrypt for flush decrypted data and update the volume header. You say that you waited several minutes and VeraCrypt didn't display anything during this period: that's because the Rescue Disk was either reading data from disk or writing data (we use 63KB chunks).

The read/write operations are handled by the BIOS and it is clear that in your case these operations are taking too much time. This is a clear indicating that you drive is failing or starting to fail.
The fact that you hear "bip" after pressing ESC several times means that the BIOS is busy: this is normal since the Rescue Disk is blocked while waiting got BIOS to return data or write data.

By shutting down the PC without waiting for the Rescue Disk to acknowledge that the data is flushed correctly, the encryption pointer in the header became incorrect and this in turn will cause issue when decryption is tried a second time.

Even if we change the header to make it non encrypted, this would have not solved the issue because we would not have been able to update it. If a software is abruptly stopped, nothing can be updated.
Moreover, a bootloader code is not multithreaded as in desktop programs and if we are blocked in a BIOS interrupt (in this case INT 13), we can't catch the ESC key (which remains in the console buffer).

Abruptly stopping the decryption has always been the weakest point in Rescue Disk logic since TrueCrypt days and it will always be the case. The only potential safeguard that we can add is to detect if the drive has already been decrypted partially and in this case refuse to perform a decryption that may destroy data. But even in this case, we can't know for sure the real position of data and one must manually determine it and this will be outside the scope of the Rescue Disk.

Concerning the 0xFF data you found in your drive: you said that that the Rescue Disk decryption 1/3 rd of the data but the offset you gave (0xB1FC545000) is much more than that. For me this sounds that a hardware issue of some sort that is causing big parts of the disk to be set to 0xFF.
VeraCrypt decryption code has been tested over and over and I don't see any bug in the crypto side of the I/O side that would cause writing 0xFF bytes everywhere.

If there was a bug, you would end up with random looking data not constant patterns like what you see. That's why I firmly believe that you are encountering a disk error.

At this stage, one important improvement I see is to add a heuristic detection of partially decrypted system partitions to avoid a double decryption in case the volume header was not updated correctly. Adding a non-encrypted header will not help in case decryption is stopped abruptly.

Implementing decryption on Linux/MacOSX is possible but the project is lacking man power. If anybody wants to join to work on such kind of features, he will be more than welcomed. As far as I'm concerned, I do my best to prioritized between all features and requests but since most users are on Windows, things are often done on Windows first.

Conclusion: abruptly stopping the decryption process without waiting for the acknowledgment message "Decryption deferred." caused the encryption state to not be saved to the header and this in turn caused a double decrypt. Moreover, suspicious 0xFF bytes seem to indicate an issue with the storage itself (probably this also caused Windows to fail to boot with STOP 0x0000006B).
Dec 4, 2015 at 6:20 PM
Mounir,

I have been reading DesperateGirl's and Kudalufi's threads / posts during these last few days.

I cringe at the manner in which they write here. I can understand they are confused and worried about their potential data loss. However even the most idiotic computer user knows they should always backup to a separate physical disk, in case of hardware failures.

I have rarely seen such patronising, ungrateful and sneering comments written on any forum, such as those posted by this double act.

Mounir, for every member such as these, there are many who are extremely grateful for your work, especially when considering it is for free and you are totally alone in coding.

I didn't really wish to join in on this conversation, but I did not want you to think these 2 are in anyway representative of your loyal and grateful user base.

Please do not waste any more of your valuable time here and accept your users sympathy.

Thank you very much for all your hard work on VeraCrypt, you are doing great.



I am unlikely to comment further as getting past the codeplex captcha is almost impossible now. Trying to post for 20+ minutes so far...
Dec 4, 2015 at 7:41 PM
I agree with you DBkray 100%! I am so grateful that Mounir and others are picking up where truecrypt left off. As you said the program is free; but at the same time it does not have a protection against "Idiot User errors". Why they didn't take the time to backup their drive is amazing. Looking at DesperateGirl's and Kudalufi's threads, you would think they have put up thousands of their dollars into this -- they write as if Mounir is working solely for them. I am so very, very thankful for the work of Mounir and others.
Dec 5, 2015 at 1:03 AM
This is basically a user error for me.
1) keep data and system seperate
2) If data is important, make a backup

@Kudalufi
"plausible deniability doesn't work" - Can you explain what you mean, Kudalufi ?
As idrassi told you,, sacrificing pd would not even solve the issue you had.

Actually, a higher level Linux-based rescue disk can be better as well as a problem. A more complex system is usually more likely to fail....
Dec 5, 2015 at 3:29 AM
idrassi wrote:
Hi,

@Kudalufi: I understand your frustration and you anger, but let's try to be objective and to analyze the situation calmly. Recommending to not use Rescue Disk is a little too much because it is a very important recovery tool and it works if the Rescue Disk is able to read / write data correctly from disk. In case where the disk itself is malfunctioning, then there is no guarantee that the process will succeed.
Hi Idrassi! Thank you for your time in writing this.

First of all, while I was and am frustrated, I apologize if I came across as angry. This was certainly not my intent. I believed and still do there are areas in VC that could be made better, and my hope was that a case study of a real failure could be used to improve the way VC can be used in recovery cases. I made my own mistakes in this situation, which I have openly acknowledged. I wanted this to be a lessons learned for all so that a) users could avoid my mistakes, b) users can take steps to prevent issues in the future, and c) to improve VC's performance in extremis situations.
Going back to you report, the problem came from the fact that you didn't wait for VeraCrypt for flush decrypted data and update the volume header. You say that you waited several minutes and VeraCrypt didn't display anything during this period: that's because the Rescue Disk was either reading data from disk or writing data (we use 63KB chunks).
The read/write operations are handled by the BIOS and it is clear that in your case these operations are taking too much time. This is a clear indicating that you drive is failing or starting to fail.
The hard drive light was working during the whole process, and it was out the whole time after I pressed ESC. I'm familiar with how a hard drive behaves when it is close to, or in the process of failing in portions and my drive is physically fine. Just to verify this, I've used dd to seek the drive to the area it was at when I terminated the decryption, and have done a half dozen read/write cycles on it with various test patterns. I have also done the same thing for the header area. There is no slow down, there are no read or write pauses, no errors. The drive is physically fine by any metric I know. The initial problem was not a physical one, nor was it in any way VeraCrypt's fault. It was an OS issue.
The fact that you hear "bip" after pressing ESC several times means that the BIOS is busy: this is normal since the Rescue Disk is blocked while waiting got BIOS to return data or write data.
Yes, the hardware beep occurs when the 16 byte hardware keyboard buffer is full. I do not believe it was the drive that was blocking. I can't, of course, discount it entirely - you never can. Heavens, for all I know a high-velocity cosmic ray struck something. However, in most cases when BIOS is blocking on a read or write of a failing drive the hard drive light is solid on. There are also often mechanical noises of the drive performing repeated head seek resets. I have also performed subsequent testing (see below). On balance of probabilities, I assess the chances that it was a hardware failure are low.
By shutting down the PC without waiting for the Rescue Disk to acknowledge that the data is flushed correctly, the encryption pointer in the header became incorrect and this in turn will cause issue when decryption is tried a second time.
For a 64k block, I'm not sure how much longer one would have been expected to wait. The available documentation doesn't say what one should be waiting for - perhaps I could make some screenshots for the revery CD docs? I have used the rescue CD exactly once before, several years ago, and couldn't be sure that this was not what was supposed to happen. The drive was decrypting at just over a megabyte per second before I interrupted it. Just as it had been doing for several days. The countdown stopped immediately when I did press ESC. Pressing ESC again after that did nothing until the keyboard buffer filled, at which point it just beeped, suggesting to me that nothing was checking the keyboard any more. I waited several minutes to be "sure", but there was nothing to suggest anything was amiss. So I proceeded with my plan and rebooted the computer into Linux, mounted the drive (again, not knowing that your not supposed to be able to after it's partially decryped), got some data off it, then restarted the rescue CD. At this point it started over from the beginning and that was the first indication something had gone wrong. I terminated this process immediately, and that time I saw what was supposed to happen with you hit ESC.
Even if we change the header to make it non encrypted, this would have not solved the issue because we would not have been able to update it. If a software is abruptly stopped, nothing can be updated.
The issue with the encrypted header is this. Once this problem occurred, there was then nothing that VeraCrypt could do to mount the partition. It refused on grounds that the drive was partially decrypted. An unencrypted header would have allowed me to do two things. 1) reset the encrypted size pointer to the size of the drive so I could mount it in VeraCrypt (read only) to get data off it, and 2) reset the pointer to where it was supposed to be once I identified that location.

As it was, I personally had the skill to patch VC to work for me, but this isn't my day job any more so while this might seem easy for you, it was pretty daunting for me. And 99% of users won't be able to do that. While the number of people who can pull out wxHexEditor and intelligently use it to work the problem might not be all that huge either, I would suggest it's a much wider audience than those who can patch VC. An encrypted header prevents solving the issues you can't foresee. It's like encasing your car's engine block in plastic to keep it from rusting on the grounds that nothing should go wrong and there's no reason to go inside it. We all know something somewhere will fail sooner or later.
Abruptly stopping the decryption has always been the weakest point in Rescue Disk logic since TrueCrypt days and it will always be the case. The only potential safeguard that we can add is to detect if the drive has already been decrypted partially and in this case refuse to perform a decryption that may destroy data. But even in this case, we can't know for sure the real position of data and one must manually determine it and this will be outside the scope of the Rescue Disk.
This is true. Stopping it is a problem. But you also can't expect a person to be able to run his computer for weeks straight with no interruptions. A two week period of time where a brownout will destroy your data, I'm sorry, but having that as the only emergency tool we give people is just asking for trouble. You can't expect everyone to have access to high reliability backed up power in those situations. And with the speed that the 16 bit code in the rescue disk works at for today's drive sizes, you could be upwards of three weeks.

Perhaps I can work with you to make a better rescue disk. If the Linux version of VC could decrypt in place, then the rescue disc could be a small Linux image with VC on it. I would be happy to work on patches to VC to make the Linux better for recovery.
Concerning the 0xFF data you found in your drive: you said that that the Rescue Disk decryption 1/3 rd of the data but the offset you gave (0xB1FC545000) is much more than that. For me this sounds that a hardware issue of some sort that is causing big parts of the disk to be set to 0xFF.
It was actually closer to half of the drive that the rescue CD operated on. When I discovered the 0xFF area, I first assumed I had been wrong about where decryption stopped, but it turns out my initial number was right, just some of the drive decrypted ok. Here's what I know for sure right now:
  • Decryption works back to front, as you know, and started at the "end" of the drive, 0xE8 E0DB 6000, and worked backwards
  • For an unknown reason, decryption wrote solid 0xFF for the first 24% of the drive's decryption until byte 0xB1 FC54 5000 (offset from the front of the drive)
  • Decryption then appears to have worked properly until somewhere close to byte 0x81 28D9 C200, or until about 44% was decrypted
VeraCrypt decryption code has been tested over and over and I don't see any bug in the crypto side of the I/O side that would cause writing 0xFF bytes everywhere.

If there was a bug, you would end up with random looking data not constant patterns like what you see. That's why I firmly believe that you are encountering a disk error.
Something wonky happened, but I really don't think it's hardware drive issues. I'll do some more testing, of course, but so far I see zero evidence of any hardware issues. If you're interested enough, I can send you the physical drive for a post mortem once I get a replacement. It's an interesting test case.

Does the 16 bit code see much use? Whether or not it does, I don't necessarily agree that random data is the most likely outcome of a problem. Especially if the problem involves an overflow somewhere in the 16 bit code. Getting rid of that code, or at least making other tools that render it unnecessary in more cases, can only be a good thing.
Implementing decryption on Linux/MacOSX is possible but the project is lacking man power. If anybody wants to join to work on such kind of features, he will be more than welcomed. As far as I'm concerned, I do my best to prioritized between all features and requests but since most users are on Windows, things are often done on Windows first.
I can help with some of this to be sure.
Dec 5, 2015 at 3:09 PM
Edited Dec 5, 2015 at 3:15 PM
Wait - is 0xFF what VC writes to empty space during encryption of a volume? That would be very bad regarding the plausible deniability of a hidden volume, as I explained her: https://veracrypt.codeplex.com/discussions/646758 .

Is the "permanently decrypt" tool incompatible with system drives? otherwise windows pe+traveler disk should solve the problem. I was not aware that the rescue disk is that damn slow?!