I’ve never had any problems burning DVDs. There was not a single DVD that was logically damaged since I use InfraRecorder. And my Plextor 716-UF never physically messed up a DVD. Hence I became quite sloppy when it comes to checking a newly burnt DVD. My normal workflow is to generate an ISO-image up front and burn that to DVD later. Because writing the ISO image is faster than burning the actual DVD, I can delete any source files sooner and take care of other things. Burning the actual DVD is something I do while cooking, running or anything else that takes more than 15 minutes. After all, now that I have my DVD-drive connected via USB instead of Firewire the maximum burn-speed is limited to 8x. I really like(d) Firewire and I don’t care that the controller chips are more expensive than the USB ones :(
Two years ago I consolidated all my old (=unreliable) CDs containing my source code from the early 90’s up to 2007 onto one DVD. That of course failed, but was not immediately apparent. According to InfraRecorder everything went according to plan. A quick access of the DVD after burning it and looking at the root directory seemed perfectly normal, so I stored the DVD in a secure place, removed the data from my computer and destroyed the old CDs. A few weeks later I tried to read some data from the DVD and to my surprise (and shock) every directory was full of files and directories, with names containing all known Unicode codepoints known to man.
Woops…
After the first shock wore off, I quickly found something positive: It’s not a physical defect because the drive is still able to read it. So it must be a logical defect only, which should be easy to fix. I quickly stored the DVD as an ISO image on my hard-drive to have something to work with. Looking at the file in my favourite hex editor (mirkes.de Tiny Hexer Medium Edition) I could see directory names, text/source files. PNG and GIF headers etc. So all, or at least some of the data, was still there, only inaccessible. What a relief.
Fix it!
A few days ago I finally managed to read up on the ISO 9660 format (ECMA-119). Armed with that knowledge and Perl, I poked around in the ISO image.
The Volume Descriptors were in perfect shape and so where the Path Tables following them. But when it came to the Directory Records, only the first few were valid. The extents (Offset from the beginning of the medium in sectors) referencing files or subdirectories were pointing somewhere, but not at what they were supposed to point at. Oh my, manually fixing file offsets?
One of the few valid Directory Records pointed to an old Pascal source file. So I jumped to that position in the hex editor and tried to find that file somewhere in its vicinity. Perhaps I only needed to insert a single byte at the right position to be done? I found the file closer to the start of the image than it was supposed to be and by an exact multiplier of the sector size. The chances for that happening by coincidence is 1/2048, which I don’t consider to be a coincidence at all. Even more so, because the difference in sectors was exactly 42. So I had found my answer: full sectors are missing, not just an arbitrary number of bytes. (Once again 42 is the answer to everything)
So I tried to find some other files by looking for them 42 sectors before their reported location, but without luck. Of course not, would have been too easy… However, I compared the Volume Space Size of the Volume Descriptor with the actual size of the ISO image. Their difference was exactly 520 sectors. Not a single byte more or less. So all I need to do is to insert those missing 520 sectors at the right position and another data-loss should be averted!
Sounds easy, but how to automate that without introducing new errors? Not wanting to waste more time than absolutely necessary and because fixing ISO images is not something I have to do every day, I chose to do it all manually. What a great idea! Find the missing 520 sectors in 1 824 449 sectors!
I started with the the part of the image containing all the directory information. Which were the first 2500 sectors. The Path Table consecutively stores all the directories and the extent where they are located. At that location a number of Directory Records is stored for each file, sub-directory and most importantly the directory itself and its parent. It is those last two records, which, by definition, are the first ones listed. And they are easy to scan for, including scanning manually with a hex editor.
Sorting the entries in the Path Table by extent, resulted in a list I could compare with each sector. Because the extents of most directories were only separated by a single sector (containing the same directory information but in UCS-2 encoding for Joliet) I was able to shorten the list by a lot. All I had to do, was to check the entries, that were separated by more than one sector. Most of them were only separated by two sectors because the directory records in UCS-2 encoding took up more than one sector. Others were separated by 10 sectors according to the path table, while their actual size was only one sector. So I inserted the required number of empty sectors to match the actual file layout with the Path Table. This had to be done 13 times and it turned out to be 42 sectors. Perfect. Now that the directory structure should have been fixed, I tried to load the file with 7-Zip, and it worked! I could access all directories and even a lot of the files. So I was on the right track.
To fix the file section I started out the same way I did with the directory information. By generating a list of all the files, sorted by extent, I had my reference to check against. Using binary search and text files or binary files (with easy to recognise headers) as my anchors. At the end I found six positions where I had to insert some sectors, all between sectors 200950 and 206174 so it didn’t take too long.
Finalise
I don’t know why the DVD was damaged in the first place. But something like this happening once in 7 years is nothing that troubles me too much (now that I have my data back), Because the whole debacle could have been avoided by checking the DVD and/or ISO image more thoroughly. Which nowadays I do. Mostly.
Nevertheless, it was a great opportunity to get familiar with the (outdated; hello UDF) ISO 9660 format. By storing the directory structure in Path Tables and the actual Directory Records you have two ways to find your data on the medium. This redundancy helped me tremendously, because I could use the Path Tables as reference. Without that table I would have had to traverse the damaged Directory Records. A frightening thought to say the least. There is even an optional second Path Table as backup. And while the Primary Volume Descriptor didn’t point to one, it was located right after the first one. This is not very helpful in case of physical damage, because then neighbouring sectors have a higher likelihood to be unreadable. But if it were written to the end of the disc, it could be really helpful in recovering a disc. The same could be said about the Directory Records not being spread out on the disc. On the other hand that would incur a huge penalty when reading the disc and loading the Directory Records. (Slow seek-times of optical drives and whatnot) Finally, all the information is doubled if Joliet is used, which means even more redundancy. Writing those Path Tables and Directory Records at the end of the disc would allow for a better protection when it comes to physical damage. Again with a performance hit, but a smaller one because the information is not spread out over the disc but located together at the end. So it’s just a one-time seek to the very end of the disc. Well, so much for my thoughts on the ISO 9660 format.