Thursday, August 07, 2008

My Olympic Record

I like sports, I like the olympic games, but I don't care at all about records. The competition and the gestures make the show, the records mainly remind me of Ben Johnson, doping, etc..
I like speed and the speed of computers can thrill me. It's a safe case of speed, with no doping, no danger for the health. We are rarely allowed to appreciate the real speed of our computers, because most of their potential is wasted or not exploited. When surfing the internet, I can hear the fan of my computer reaching new records of decibels. It's my CPU that is employed for a Java animation into a background window: the perfect waste! Other times, it's Acrobat Reader that activates the fan. I don't know what it's doing. I know that Acrobat Reader is an application that does a single thing, and takes 100 Mb of my hard disc for doing that single thing. I understand it wastes disc space, CPU cycles and quite likely my RAM too.
The best way to appreciate the real speed of your computer is to write a program by yourself. When I wrote the first version of iNMR, I optimized it for a CPU called PowerPC G3. It was a CPU I knew quite well. When I had only written the first few lines of code, I switched to a new computer, equipped with a G5 CPU (it's the machine I am using at this moment). I continued writing the code in the way I was used to, optimized for the old G3. When iNMR became public, everybody appreciated its speed. I knew, however, that it could have been much faster. With version 2, I substituted the FFT routine. The new one, optimized for the G4, was nearly twice as fast. Great! I knew, however, that there was another lot of room to grow. During the last couple of weeks I have been working at a new engine, that will be the heart of version 3, still optimized for the G4. The reason to exist of this new engine is that the current version of iNMR requires an amount of RAM that's the double of the size of the spectrum on the foreground. It makes the program faster, but can be a problem when the foreground spectrum is a bulky 3D.
The new engine requires much less memory. I have also worked to make it as fast as possible, without compromises. I have completely changed the order in which data points are stored and even the sign conventions. The user interface bears no trace of this revolution. The program, externally, looks exactly the same. But it's more than 3 times as fast as version 1. Gauging this performance I am not taking into account all the possible operations, but only the most frequently used ones (FFT, zero-filling, transposition, weighting, ecc.. in other words the standard flow of operations). 3 times is the global average. Some 2D drawing routines are also moderately faster. Phase correction has remained the same, I don't know how to make it faster. Simulation of spin systems can also be faster, but I haven't optimized it yet.
It's not easy for the casual user to verify this speed bump, because the existing versions are already so fast that many spectra are processed instantly. To see the difference you have to choose a large phase-sensitive 2D example.
In summary, using the same computer and OS, the new code runs more than 3 times faster. I have written this ultra-optimized code in August 2008 but since 2005 I already had all the ingredients and tools to make it. Why didn't I? I would have saved a lot of time and the customers would have saved a little of money. Let's address the last point first. They are not forced to upgrade, because version 2 is fast more than enough. Now, let me find an excuse for myself. I haven't got the perfect recipe to make a program faster. What I do is to substitute a piece of the program and measure the time taken by the modified version to process a spectrum. If the time has been reduced, I retain the change and go on changing another line. As you can see, this kind of optimization can only be done after the program is complete. There is also another explanation: number-crunching is only one if the many things that a program must do (and that a programmer should optimize). The main reason for writing this new engine was a different one. As I have said, there was the need for something less memory-hungry. It's only after years of experience that you can rank priorities like these.
Version 3 will become available in January, after a prolonged period of testing. If you have an old computer with a G4 CPU, don't throw it away: it will become 3 times faster!!!

0 Comments:

Post a Comment

<< Home