Sunday, March 11, 2007

Number One

You may have noticed that I usually express negative opinions towards NMR software, while last week I praised a number of programs that have nothing to do with science. There is a natural exception: I like my own software because, whenever I feel it should improved, I change it (and I update it quite often). The majority of NMR software, however, is not elegant. In a future posts I will try to find multiple possible reasons. To start with, we should ask ourselves: "Does anybody care if NMR software contains the Wow factor?".
Few years ago the presentations of two new NMR programs appeared on the web. What they had in common was elegance and the desire to amaze the chemist. This couple of examples demonstrate that, in 2005, there were some companies trying to write modern and innovative products.
The first presentation contains two videos of the "dataChord Spectrum Analyst" in action. It generates lists of NMR data already formatted for publication inside articles or patents.
The second presentation combines a web site and a pdf document to introduce the "NMRnotebook": an all-automatic processor that stores the user's spectra into a single container (hence the name).
After two years nothing has changed and this is extremely anomalous. All other NMR programs are updated at least every year, when they are mature, and more often when they are young. It's really rare to find something still at version 1.0. Very elegant, as if they had reached perfection at the first attempt, but rare indeed.
A frequently seen phenomenon is a product introduced into the market as a "wizard" that fails commercially. I have never had the occasion to see these programs. Time will tell.

Friday, March 09, 2007

TopSpin

What I LikeWhat I Don't Like
The price. The higher, the better.f1 is erroneously called F2 and f2 is called F1 instead!
The absence of academic discounts.The window is cluttered with icons.
The icons, small and uninspiring, give a non-professional look to the whole.
It's much better than XWin-NMR. Somebody, who has only seen these two NMR programs in his life, believes that TopSpin in the best ever. I can't blame him. I can't find the menu command to open the lock window.
A rich collection of old stuff (DAISY, DNMR, Virtual Spectrometer, etc...), now collectively called "Structure Analysis Tools". It's remorselessly modal. Each mode has its own array of meaningless icons. Many modes even have their own submodes...
I can see the spectrum as large as I like.Aliased graphics brings me ten years back.
The user's guide in pdf: I know that it doesn't cover all the details, but it's a pleasure to browse.The 2D modes are different from their 1D counterparts. The 2D phase-correction mode is not interactive. Phase correction in 2D is unreasonably overcomplicated.
The data-browser.I don't like how the integral regions and the integral values are displayed.
The multiple-display mode.Topspin eats a lot of RAM.

Tuesday, March 06, 2007

Reference Deconvolution

It all began one month ago when Ed commented on this blog praising the Reference Deconvolution and the way it is implemented by Chenomx. I had heard about the subject, but never studied it, and apparently could not, because there is no scientific library in my town. Luckily, the Chenomx web site contains an article which not only introduces the basic concepts, but also contains some useful references. The latter of them:
KR Metz, MM Lam, and AG Webb. 2000. "Reference Deconvolution: A Simple and Effective Method for Resolution Enhancement in Nuclear Magnetic Resonance Spectroscopy". Concepts in Magnetic Resonance. 12:21-42.
is even freely accessible on the web and also makes for accessible reading. Even if you have never worked with complex numbers before, there you can learn how to perform reference deconvolution.
With this minimal theoretical background, without further reading, I have experimented with Reference Deconvolution and, actually, I have not found it to be a great improvement over the traditional lorentz-to-gaussian weighting. All my spectra are well shimmed and my lineshapes are quite near to the ideal lorentzian function, but even the pictures I have found on the review did not impress me so much. Here I will try to explain in even simpler terms what Reference Deconvolution is all about, assuming that the reader is familiar with the practice of weighting the FID in FT-NMR. You don't have to understand the equations, it's enough to understand the process graphically. iNMR users perform weighting by dragging a control with the mouse. The more they drag it, the more the shapes change. Almost everybody believes that you have to know what's happening at the math level, you need to know which functions you are using and with which parameters. I don't think so. It's enough that you open your eyes to have a complete understanding of the process. When you have your own musical tastes they will not change if you study the history of music. What matters is what you like: if you like a weighted spectrum, and if your software is well-written, you can soon learn weighting. You will continue to ignore the theory but your spectra will look much better. If you don't like weighted spectra, studying chapters of theory will not change neither your attitude nor the appearance of your spectra.
Everybody will agree with me on the following example, at least. The reason why the number of points of an NMR spectrum is a power of two is because the Cooley-Tukey algorithm requires so. Everybody knows it, but only a few spectroscopists actually know how the algorithm works and everybody agrees this is not a necessity!
Convolution is like dressing somebody who is already dressed: after the process he will have two clothes. A convoluted peak will have two shapes. A deconvoluted peak is like a naked person. Like the physician needs an undressed patient, the spectroscopist may need an undressed multiplet, to better measure its coupling constant or, more simply, to count the number of components. Convolution and Deconvolution in frequency domain are equivalent to multiplication or division in time domain. For the sake of simplicity, I will not mention the domains explicitly, from now on. For example: when I will say that I divide spectrum A by spectrum B, I mean that first I apply an inverse FT to both spectra, then I divide them, then apply a direct FT to the result. Division in the other domain is not only faster than direct deconvolution, but more feasible: we don't have the formula for the deconvolution...
When you perform the traditional lorentz-to-gauss transformation you are performing a deconvolution with a lorentzian and a convolution with a gaussian. The rationale is that you assume that the starting shape of your peaks is lorentzian. Let's suppose that this is true and you don't know it. Can you obtain the pure shape of your peak, in the format of a weighting function? The answer is yes. First step: you correct the phase of your peak. Second step: you move it to the position of the transmitter. This can't be done with common NMR software, yet it's trivial to write a program that shifts/rotate the spectrum; otherwise move the transmitter. Third step: you amplify or de-amplify the spectrum in order to set the area to one. At this point you have lost all the information about phase, frequency and intensity. All that remains of your original peak is pure shape. Perform an inverse FT and you obtain a real function (an exponential, to be more precise) that starts from 1 and decays towards zero. Once you know this function, you can divide by it any other spectrum and undress its peaks. Even if the other spectrum contains a different number of points, falling at different positions on the time axis, you can safely calculate the missing point by interpolation. A good thing of this approach is that, knowing that the result is a real function, you don't even need to correct the phase or change the frequency (steps 1 and 2). It's enough, if the spectrum contains a single peak, to calculate the absolute value of the FID.
You ask: if the sample is poorly shimmed, and the mathematical shape unknown, can I apply the same method to create a tailored weighting function? The answer is no. First reason: when the peak is not symmetric, the equivalent shape in time domain is not real but complex, and it means that you cannot eliminate the phase and frequency information. Second reason: even when the peak is symmetric, it goes from minus to plus infinite; if you are forced to truncate the spectrum (because of the noise, or to exclude other peaks) you are introducing unwanted components. The method outlined above does not work at all, because the distortions (caused by truncation) are heavy indeed. Reference Deconvolution works because it creates two sets of such distortions. You divide the first set by the second one, and the distortions will mutually cancel. The three players are: an experimental spectrum S, an isolated peak R (the reference peak) and its ideal shape I (the ideal peak). The deconvoluted spectrum D is simply obtained:
D = S (I / P).
Because I and P share the same frequency, phase and _truncation_effects_, all the distortions are mutually cancelled. You cannot, however, obtain the equivalent of I and P in time domain. They write that Reference Deconvolution is performed in time domain, but it's not the true, experimental, time domain. You cannot apply it directly on the FID. In all cases I have seen, you need to transform the spectrum to frequency domain and when, later in the process, you apply an inverse FT on it, you get something quite different from the FID. I think it's more correct to say that you are working into an artificial time domain.
From the point of view of the spectroscopist, it's a pity that any generic shape can't be described by a real function in time domain. Otherwise, reference deconvolution could be an extreme variant of weighting and inserted in the usual processing workflow. It would be possible to store the calculated shape and recycle it to weight different FIDs (through interpolation, as I has written). It would be possible to mix this shape (or, better, its inverse) with other weighting functions, etc...
In the real world, you are confined to the formula: D = S (I / P). You can also rearrange it into D = S / (P/I), as I am using to do in practice, but nothing really changes. The individual "I" and "P", in the artificial time domain, are of no use, only their ratio works.
There is some freedom in the definition of P and I. I have seen, for example, that the final result is very sensitive to the selection of P. The reference peak must be an isolated singlet with enough signal/noise. Once you have selected a region around it, you can verify that, enlarging or narrowing the region by a few points, changes the level of noise in D (the final result) in a non predictable way. In practice you need an adjustable control to test all the possible combinations in a couple of seconds, by trial and error (FT only take a fraction of second on recent computers). That part of the spectrum that is not selected is normally set to zero, but I wonder if there could be any advantage in smoothing it more slowly along the edges.
As for the shape of I, the review I have cited at the beginning suggests using a delta function, that is: all the area in concentrated into a single point, the rest is set to zero. In this case it is necessary to add a classic exponential or gaussian weighting (in the artificial time domain), either to I or to S, because the delta function, alone, amplifies the noise to intolerable levels. From what I have seen, Chenomx doesn't seem to adopt this recipe. It seems to me that they choose as "I" the lorentzian line graphically displayed by the program before the operation, which is probably truncated at the very same points as "P". In this implementation, the truncation effects on P and I are the same ones (first advantage) and there is much more control on the numerical value of the final linewidth (second advantage). This is why I wrote, in my previous post, that this implementation is optimized for line broadening. The delta function, instead, is better suited to achieve resolution enhancement. You can experiment by yourself, because the delta function alternative has been implemented by iNMR 2.1, and both iNMR and Chenomx's NMR suite can be downloaded and tried.
Theory says you can experiment with different ideal shapes, provided they have no doublet character (the FID of a doublet contains characteristic holes, in which you don't like to fall). Chenomx allows you to include the 29-Si satellites of DSS, which definitely possess a doublet character, and the algorithm still works great! The only possible explanation is that the noise has filled the holes. For a strange coincidence, the very first spectrum I experimented with, though sporting apparently suited singlets, didn't lend itself to reference deconvolution. I have tested the two programs and in both cases I've got wiggles everywhere (but at the reference peak). I can send this file to the interested reader.
It's like when they reprint a long-playing on CD. Quite often I like the new remix, but there are also cases when the original LP sounded better. My current opinion about Reference Deconvolution is that it's only a trick of Digital Signal Processing, not an essential part of experimental NMR. Trying it doesn't hurt, though. I have also found a reference to a drastically different application of the method.

Monday, March 05, 2007

Suite

"Chenomx NMR Suite is designed to fit spectra in a computer-assisted fashion" says the manual and it says it all. Don't be mislead by the term "suite": this software (a more appropriate term, in my humble opinion) has a single task to perform. Translation: you will be measuring the single concentrations of known compounds into a mixture not by integration, but by visual fitting, and the computer will provide you with all the tools for the task. Eventually it will be you alone to judge the goodness of the fit. Obviously there is much more than this both in NMR and in metabolomics, so the name suite is misleading (it makes me think at MS Office, which is an unfitting example; the name "suite" is also used to indicate the most expensive room of an hotel, and the "NMR Suite" actually seems to be the most expensive of all NMR programs).
I have downloaded version 4.6 of the software, courtesy of Chenomx, and followed the enclosed tutorials. (Before running the software, you need to receive two e-mails back from Chenomx).
I can't comment on the correctness of visual fitting, compared to integration or to deconvolution. The mere fact that this software is commercialized is a measure of the effectiveness of the method. I have not worked with real life examples; I was however convinced by what I have seen. Generally speaking, integration is the best method we have to measure concentrations by NMR, followed by deconvolution in frequency domain. In biological mixtures, if they give a nice spectrum and if you process it properly, you can use the height of the peaks (instead of the areas) as a measure of concentration. The condition is that the shapes in your spectrum correspond to the shapes in the reference library you are using, and you fulfill this condition with reference deconvolution (pardon me for the pun).
Suddenly everything gets complicated, doesn't it? Let me explain the single terms. The libraries, also provided by Chenomx, are the natural complement of the suite (they also are what makes the product so expensive). Each library contains the multiplets of a good deal of metabolites, at a given magnetic field. The word "multiplet" is never used, "cluster" is used instead, both for not forcing the user to learn spectroscopy and also because sometimes they are really clusters (or singlets), not true multiplets. The library takes into account the pH, because many chemical shifts are pH sensitive. There are also unpredictable matrix effects, so the frequencies can also be finely tuned by hand.
Reference Deconvolution is one of the tricks of Digital Signal Processing. Although more often useful for resolution enhancement, here it is implemented with the opposite aim of line broadening. The purpose is the obtain nearly perfect lorentzian lineshapes, whose width match that of the library peaks. Without matching shapes and width, fitting the heights would be of no use, hence the central importance of reference deconvolution. This particular implementation seems optimized for line broadening. I'll explain why in the next post.
While most of the software interfaces use to puzzle the chemists with countless icons, aligned in rows, double rows and triple rows, most of which are never touched, Chenomx comes with a simple and clean layout, very well designed that should make for a pleasant working experience. The latter is unfortunately marred by the irritating slowness of the program. I have tested the Suite on a iMac G5 running at 2 GHz and it is as slow, if not slower, as my old Centris running at 25 MHz, while on paper it should be 80 times faster. The computational needs are basic (the most heavy one being the traditional monodimensional FT), so there would be no necessity of a 2 GHz computer. Even a 200 MHz machine, if properly programmed, should rarely display the "wait a moment" cursor when performing these kind of calculations. This 2 GHz model, instead, hangs often. The same machine, with other software, easily performs a whole 2D FT in less time! I mean: this whole work could be carried out with a ten years old computer. They are forcing us to buy a new computer, with all the negative consequences for the environment, not to run faster, but simply because today's programmers like Java instead of other languages. I really don't see why I should care for programmers. This particular application, for example, is so specialistic, that it was not so important to have it developed cross-platform, but even if it was so, faster alternatives were still available. Even if they weren't, other Java programs used by spectroscopists react in much less time. There is no excuse for such slowness and it's a pity, because the user interface is so nice. Other programs cause the user to spend a lot of time just to locate the icon (or the menu item) they need, while with the Suite you soon learn where to find them.
After completing my tutorial, I can swiftly find in the library the compound I am looking for, either by name or by frequency. In the latter case, I normally get a list of candidates. After selecting one of them, a control appears, called the "cluster navigator". With it I can zoom, in succession, into the regions where signal of the selected compound resonate and I can verify if the metabolite is present or not into the mixture. In the affirmative case I usually activate an automatic fit of the heights. The program remains on the safe side and usually underestimates the concentrations, so I shall refine the fit manually. As soon as I bring the cursor near the top of the peak, a new displacement control appears. Dragging it I can change both the frequency and the height of the peak. It would be better if the cursor disappeared during dragging, because it hides the top of the curve.
The suite is actually a single executable, that can launch four windows: the profiler (the one just described), the Library Manager, the Signature Builder (each signature corresponds to a single compound into the libraries) and the Processor. It's like a modal program, with the difference that you can have the four modules running concurrently. You cannot, however, open more than one window per module. The processor in itself also comes in modes, so you have a modality inside another modality. You know that I consider modality a bad programming style, and I am not alone. Once for a while, it is given some good reason to exist, because each mode is intuitively interactive. You can graphically define the DSS position (a hydrophilic equivalent of TMS), and you can graphically define the breakpoints for the Spline that corrects the baseline. There is a price to pay for modality: you need a double click to enter into selection mode and another double click to expand the selection...
The processor is the weak link in the chain. I wonder why Varian has invested money into Chenomx and has not lent their algorithms. Everything is slow, even opening a file. The phase correction mode does not allow to specify a pivot point, like all software do today. This feature will be added, they say, in version 5. There is a useless command called "Water Deletion" and all that it does is to hide the central region of the spectrum: the tails of water remains in the outer regions. I'd suggest them reading an article published 18 years ago on JMR (Vol. 84, page 425) by Marion et al. and titled "Improved Solvent Suppression in One- and Two-Dimensional NMR Spectra by Convolution of Time-Domain Data". Convolution in TD is not useful because it hides the water, but because it improves the baseline and doesn't cancel nearby signals.
I also know that many spectroscopists like the Spline for baseline correction, yet I prefer the more regular polynomial. This option is not available. The Spline lets you cancel humps and solvents, but I prefer a baseline correction that corrects the baseline and nothing else. The reference deconvolution works, but you should be careful not to play with it too much: the first attempt should be the definitive one.
The processor also implements binning, but the analysis of the output requires an external software. Other tools are prettier, I mean the "tape measure" (to measure distances) and the thumbnail of the spectrum. Under it, the status line constantly reports the frequency value corresponding to the cursor.
If you want to test this software, I recommend reading the manual and following the tutorials exactly as they are explained, otherwise you are wasting hours before discovering that the defaults are not compatible with the given example. The manual comes in two versions: embedded, with lovely icons but a small character set, and the customary pdf file.
The staff at Chenomx is kind and helpful and can give you all the information you need, including a live demo of the product. Their web site hosts more specific information, including copies of their published articles and papers. If you find them too difficult, the story of the company, including photographs, as published by the Alberta Venture magazine, does not require any scientific background.

Saturday, March 03, 2007

Masterpieces

During the past holidays I discovered several practical demonstrations of how good software should always be. It's difficult to comment some things that looks almost perfect and whose every detail is truly amazing. If they had been related with NMR, not only I would have cited them long before, but I would have closed the blog for good. (Is it more off-topic to write about NMR during the week-end or to write about other things on an NMR blog?).
Good software share this property with good games: you learn the manual in a matter of minutes and never have to read it again.
ChessPuzzle by Robert Silverman entertains you with 6237 selected chess puzzles and can also hint the next move when you can't find it by yourself. The board is small, so people with visual deficiencies should half the monitor resolution. The controls are intuitive and all of them have a keyboard shortcut, which I prefer. You can also play a normal game on the 2D diagram, which is more enjoyable than any 3D representation I have tried so far. ChessPuzzle is free.
WouldjaDraw is a simple yet powerful drawing program. It does not edit pictures: you start with an empty canvas and the basic drawing tools. Costs 29.95 USD.
Both programs require Mac OS 10.4.
The suite of Ear Training applications by Usama Minegishi and Hidemoto Katsura is almost as great. Still requires Mac OS X (don't know which version). They are the best programs in their field that I have ever seen. The whole bundle costs 19 USD, but each single product is also available at 9 USD.
For not giving the impression that this is is a single-platform blog, I am closing the post citing a little lovely puzzle-solving game named Mummy Maze, which runs on Windows too. The trial is free, the fun is granted.