TALES FROM THE FLYING DISK DOCTOR - GEORGE AND MILDRED Copyright Dr Alan Solomon (1986-1995) This is another story with a happy ending; a heroic tale of data lost and found, and a small business rescued from disaster. It started when Arthur Something phoned me up. "I don't suppose you can help me," he began, "A lot of people have given up on this one." That's the kind I like the best. If a lot of people have had a go, that means that there's something to work with. And if they've all declared the task impossible, the customer will be properly appreciative when I get his data back. "It's a data base. I can't use it - I don't know what's wrong. It just suddenly stopped working. Hoskins have had a go, and Honeywell. My man John says you won't be able to help if they can't, but I thought I'd phone you anyway." I asked all the usual questions, but he didn't know any of the answers. He was the sort of user who has a few instructions written down on a piece of paper, and that's all he knows. He did tell me, though, that it was a database of people's CVs, and was entirely text, which makes it easier to rescue. So I talked to John. John surprised me. John was quite rude. I couldn't think why, as I hadn't done anything to deserve this. I explained that I certainly couldn't guarantee full or even partial success, but that I would not charge unless I could help. John told me that I'd be wasting my time, and that far better brains than mine had tried and failed - I guess he meant himself. Next day, I spoke to Arthur again. He told me a bit more about the problem - it seems that the problem had started just after John had used the computer for a bit of programming. Now I understood why John had been unpleasant to me - he was probably the cause. I asked Arthur to fetch the machine round, and I'd have a look at it. Arthur arrived, carrying his system unit. I powered it up, which gave me my first surprise; I've never seen an XT with only 128K of memory before. I had a look round the hard disk. There was DOS, a telex system, a COBOL compiler, and an empty PFS-File sub-directory. I had a closer look at that, and found that the sub-directory had once contained the PFS-File program, but now it was deleted. There was a sub-directory called BUCHECK, that had one file called NEWFILE. And finally, there was the sub-directory called CVFILE, which should have contained a database of CVs, but now contained a file called GEORGE and a file called MILDRED. The GEORGE file was exactly the same size and date as NEWFILE, so I assumed that NEWFILE was just a copy. I had a look at GEORGE, by using the TYPE command. It started off with \CVFILE\CVFILE, but then deteriorated into a mass of hieroglyphics. MILDRED was the same, except for being about half the size. But, given the filename that was embedded in GEORGE, I felt that GEORGE must be at least part of the database, and that MILDRED probably followed on from GEORGE. So what were these hieroglyphics, and where was the text data? So, I decided that I wanted to see what a PFS-File database looks like, and asked Arthur for his copy of PFS-File. He hadn't brought it. I thought about whether I could manage without it, and decided I couldn't. I tried to UnDelete the copy on the hard disk, but that had all been overwritten long ago, so I sent Arthur off to get his copy. Arthur went off; I felt that he was a bit unwilling, which wasn't surprising as it was a very long journey, and I hadn't actually shown him anything remotely resembling his data yet. Arthur returned a few hours later clutching a disk. He hadn't brought the manual, and seemed surprised that I would need such a thing. Oh well, I thought, PFS-File is supposed to be easy to use. I put PFS-File on my hard disk (one of the cardinal rules of data recovery is, never write anything onto the unwell disk, as it might overwrite something useful). I set up a simple test database, saved it, then came out of PFS-File. I then used Type to display the database on the screen. Sure enough, it was all hieroglyphics. "Great", I said, although Arthur couldn't see what I was so pleased about. So I went into my PFS-File again, and entered A to Z into my test database, exited, and showed him how I could work out which hieroglyph corresponded to which letter. It turned out that PFS-File encodes data before storing it by simply setting the high bit, so character 70 is stored as 198, 71 as 199 and so on. Arthur still couldn't see that this was helping him, so I wrote a little Turbo program that stripped off the high bit. I ran GEORGE through the program, but killed it after a few minutes. Then I showed Arthur the output. I think that it was at that point that he really began to believe that I might be able to do something for him. I told him what my plan was. I was going to put GEORGE and MILDRED through this program, so that the database was reduced to a plain Ascii (text) file. Then I would do any necessary massaging on this file, and then feed it back into a new PFS-File database. Hopefully, nearly all the data would be there. There would be several problems in doing this, like the fact that there were little strings of bytes in the file that I couldn't yet see the purpose of, but I was fairly confident that I would be able to reverse engineer the structure of a PFS-File database, and so be able to read the file. I ran the whole of GEORGE and MILDRED through my little Turbo program, printed out the result, and gave it to Arthur to take home and decide whether this was the full database, and whether reconstructing it would be useful. Next day, Arthur turned up looking more cheerful than I had seen him before, and carrying the PFS-File manual. He said that about two-thirds of the data was on my printout, but in a very jumbled form, containing lots of garbage, and missing chunks of data. I devoured the PFS-File manual eagerly, but there was no mention of any way to feed data into a database other than through the keyboard. I broke this news to Arthur, and his face fell. "Does this mean we're sunk?", he asked. "No", I said, "There are two possibilities." The easiest is to change programs. I told him about PC-File, just as easy to use as PFS-File, but able to Import data from a text file. He looked dubious, and said he didn't fancy learning another program. "Can't you write a program that will recreate a PFS-File database?", he asked. I explained that reverse-engineering enough to read a database was very much easier than working out how to write one. When you're reading a foreign language, you don't have to worry about the exact syntax, but when you're writing, you've got to get it exactly right. I persuaded him that I could probably rescue his data, but only if I fed it into a different database program. This he accepted. "What about the jumbled form of the data, and the garbage?", he asked. "Not a problem," I said. I explained that the garbage would disappear when I had reverse-engineered the file format. The jumble might be more of a problem, but it was too soon to worry about that. But I was very worried about the missing third of the database, and Arthur said that it was the most recent data, and so the most important. "What about NEWFILE?", he said. "No, that's just a copy of GEORGE", I said. Then I thought - how do I know that? Just because it is exactly the same size and has exactly the same date and time, doesn't mean that it is the same. I had a look at NEWFILE, using Debug, and sure enough, it was different from GEORGE. "That's the missing third", I said, "Well done Arthur! This means I can get the whole database back!" Arthur looked very pleased indeed, as well he might. He told me that he'd lost his data last November, and was facing about two months of typing to get it all back in. Now, I knew all the facts about the problem, so I gave him a quote for recovering it. He accepted immediately; after actually seeing his data again, there was no way he was going to back out now. At this point, I didn't really need his help, so I showed him out, and sat down and thought. First, I copied NEWFILE, GEORGE and MILDRED onto floppy disks, calling them DISK1, DISK2 and DISK3. Then, I copied the files onto my own hard disk, calling them FILE1, FILE2 and FILE3. I poked around with the files using Debug, and found that each of them had \CVFILE\CVFILE as the first few bytes. "Funny", I thought. Then it hit me - these were Backup files, probably made when the database was sent to Hoskyns and Honeywell for them to look at. There was no trace of the full file. So I tried to used Restore to build the three files back into one, but Restore said that the files weren't Backup files. I tried a few different versions of Restore, but got the same each time. So I loaded the files into Debug, and stripped out the first 128 bytes of each, as these bytes didn't look as if they were part of the data. Then I had a look at the structure of the file. At the beginning of the first file, there were a series of records, ten bytes each. These records were eight bytes of someone's name, then two random bytes. I decided that this was the key into the rest of the database, and that the two byte pointed to the record storing the data. After these, you could see that there were a series of records, each 128 byte long, with the actual data in. But the records didn't follow on from each other. I found three records that made up a single person's CV, but they were in the database in reverse order. I decided that there must be a pointer in each record, that pointed to the next one in the sequence, and sure enough, I could see a two-byte pointer; each one pointed to the preceding record. So, now I knew how PFS-File found the start of a form, and how the 128-byte records linked together to make up the whole form. I tested my assumption by writing a little Turbo program called PFSSORT to trace through the database, one form at a time, and tracing the records within each form. For each form, I made it check that the name on the form was the same as the name in the index. I concatenated the three data files into one large file using COPY with the /B option, and ran my PFSSORT. It stopped after about 80 records, with a name mismatch. It had found a name in the index, and the corresponding record had a different name. This happened at record 2686, and a few moments calculation showed that this was inside FILE2. I re-ran my PFSSORT, and saw that this was the first time that it had tried to read from the part of the database that was in FILE2. So it was clear that something had gone wrong with stitching the three files together into one. Given the raw material I was working with, that was not very surprising. I had a look at FILE2 using Debug, and could see the problem. The records each started five bytes too soon. So I chopped five bytes out of FILE2, and did the concatenation again. I re-ran PFSSORT, and it crashed again at the same place. So I worked out where in FILE2 this was, and ran my HEXEDIT program to have a look at the file. Sure enough, the name at that position was the wrong one. So I fired up Debug and searched for the name. When I found it, it was 10240 bytes away from where it should have been. I looked at the end of FILE1 and the start of FILE2, and found what I had half-expected. There were 80 duplicated records. So I chopped these out, and I reran my PFSSORT, and it ran until it tried to access record number 5784. This time, the problem was in FILE3, so I went into that with DEBUG, and hacked off the necessary number of bytes. Then I re-ran PFSSORT, and it crashed again at record 5784. So I calculated where in the file 5784 came, and had a look at what was there. The name was wrong, of course, and I found the correct name rather earlier in the file, which meant that there were four missing records. I decided that four records were not worth worrying about (and anyway, I had nothing that might give me that data). But what was important, was that unless I had the records in the right relative places, the chains wouldn't work properly. In other words, I had to insert four dummy records, but in exactly the right place. I searched back through the file, looking at the pointers on each record, until I could see the break in the chain. I inserted 512 bytes of zeros, and re-saved the file. When I reran PFSSORT, it ran fine until it reached a record where the indexed name was just one letter different from the name in the file. But up till that point, it seemed to be reading records correctly from all over the file. It's hard to explain exactly why I did what I did next. There was no good reason for going back to a course I had abandoned. I think my reasoning was that I now had a file that was pretty much repaired, and I wanted to see what it looked like. I started up PFS-File off my hard disk, but PFS-File had a quick look at drive A and died. It obviously wanted its system disk - this program was copy protected! I had Arthur's system disk, but PFS-File was vintage 1982, and I know how copy protection was usually done in those days. I fired up Debug and had a look; sure enough, they were using the same method that 123 and all the others used. It only took me about a minute to disable the copy protection, and I could put Arthur's system disk away (he actually had two system disks; he'd told me that he'd had to get a replacement, as the first one had worn out after a few years of daily use). I started up the unprotected PFS-File, and loaded in the database. It worked! I had a look round some of the records, did a couple of searches, then made PFS-File do a full print of the data base, to disk. This file could now be used to feed back into any other data base, but it probably won't be necessary. Arthur hadn't been looking forward to learning how to use another database. So when I told him that I'd recovered his entire database, without any garbage characters, without any lost chunks and without any chunks being jumbled, he was delighted. When I told him that he could use it in PFS-File, he was over the moon. "Why did they all tell me it was impossible?", he asked. There was only one fly in the ointment. I had been looking forward to a return match with John. I had all sorts of sarcastic things ready to say to him, like "You were right, it was impossible, but that doesn't apply to me", and I wanted to ask him a few penetrating questions about how the database had got into such a mess in the first place. Unfortunately, this pleasure was denied to me. John had left the company. I asked Arthur what had happened, but Arthur was a bit vague. I suppose tact is part of the stock-in-trade of a recruitment organisation. Still, just in case John is reading this tale, I'd just like to say one thing to him. Nyaaaaah.