Friday, September 03, 2010

Oh, Sugar

I have good news! With a minimal simplification of my INADEQUATE filter I have been able to rescue the last cross-peak of cholesterol. It had been rejected because too near to the diagonal. I changed the code saying: "if it's on the diagonal, it is bad; if it's just near, let's accept it". So it is possible to have the perfect INADEQUATE of cholesterol, with all the expected cross-peaks IN and everything else OUT.
Yesterday I received another INADEQUATE spectrum, this time of sucrose. The S/N is still high enough to make my filter unnecessary. If I play with the contour plot all the noise disappears while the 12 carbon atoms and their 10 bonds remain. Only a spurious peak remains at the coordinates 103.7;-22.9. I have not received the 1-D external projection, so I created it artificially. The spectral width is the same in both dimension (instead of being doubled for the DQF axis). The consequence is that two cross-peaks fall just on the boundary and are partially folded. This is the spectrum:

I have applied the filter with the same parameters used for the cholesterol (C-C coupling; linewidths in the two dimensions) while the threshold corresponds to the above plot. The result is perfect. All the cross-peaks are resolved and they are all present. Nothing else survives.

This time all the peaks are regular anti-phase doublets. The Js are generally larger than in cholesterol.
Click on the thumbnails to see the full-size pictures. There is an expansion to help counting the correct number of cross-peaks.
Do you want to send another spectrum? I can clean it for free. Remember to enclose the 13-C of the same sample.

Wednesday, September 01, 2010

Interstellar Space

One of the most ancient 2-D experiments has always been more talked about than practiced. The INADEQUATE was invented 30 years ago (an era in which many chemists were still using CW-NMR) and has always been regarded as a the future thing, like travels to the Moon. Everybody agrees it is useful, but the experimental difficulties are discouraging.
The information that can be extracted by this experiment is a formidable aid to unveil the structure of unknown natural compounds. For a long time, however, the experiment has been nearly impossible. You needed a powerful transmitter, because the nominal 180° pulse must be a true 180° pulse over a large spectral width. You also needed a sensitive probe. A cryo-probe is the best. Today we can have both things.
I am not mentioning here the many attempts to increase the actual sensitivity with experimental tricks, because this is a blog about software. 19 years ago the S/N limit was overcome by a pure numerical method, see this paper, commercialized as CCBond (also sold under the trademark FRED). I understand that the same program is now part of the larger NMRanalyst (™). No software that I know has ever changed the world. Having never personally used this particular product, I take for granted what the paper says while observing, at the same time, that its popularity is limited.
From the cited paper I read that the program failed to detect a few bonds in the INADEQUATE spectrum of cholesterol; the authors also explained why (the presence of second order effects). I think this is a serious issue. You can't use such a tool with confidence. A user has no way to verify if these second order effects are present or not, then he cannot verify if his spectrum can or cannot be handled by this particular program.
A few months ago I received a nice INADEQUATE spectrum. Guess which compound it was? Cholesterol! This is the first time I see this particular spectrum, so I can't tell if it was acquired correctly or not. I can see ALL the bonds with possibly one exception. I see the peaks corresponding to the bond between C20 and C22, yet it's not clear if they fall at the correct frequency or, instead, at the frequencies of C10 and C20. It is amazing, however, to see all the bonds without the aid of any special software. It is like discovering that we can go to the Moon with RyanAir.
A bitter surprise came from the observation that not all the peaks have the same shape. The very few articles I have read on the subject say that every cross peak is an anti-phase doublet (one peak goes up, the other peak goes down). Here, instead, I can see several different deviations from this simple model. This means that all the numerical methods previously described cannot be applied to this example. Should I try to improve this spectrum, I am going to lose one or more peaks, which would be a pity since I can already see all the peaks in the untreated spectrum. Maybe all the other INADEQUATE spectra ever acquired only have doublets and mine is the worst INADEQUATE ever. Maybe my spectrum is OK and the model is too simplistic. I hope some reader can solve this fundament doubt.
For the moment being I assume that my spectrum is perfect, simply because it's the only spectrum I have got. I have devised a new numerical method to improve it. It works like a filter and can clean the spectrum. I start from the known frequencies of the carbon atoms. They are easy to get from the standard 13-C spectrum reported at the top in my pictures. Everybody knows that a genuine peak can only fall at the frequency of an existing carbon. The cross-peaks also come in horizontal pairs. The vertical position (DQ frequency) of a pair is given by the sum of the two chemical shifts. Anything that does not fall at these predictable frequencies cannot be a genuine signal. My method filters it out. The signal/noise remains the same, but the plot is much easier to read.
My method, in theory, requires 4 parameter, provided by the user:
- A threshold level. This is the lowest contour level.
- The approximate width of a generic multiplet in the X dimension.
- The width in the Y dimension.
- The C-C coupling constant.
The method is not dumb, though. If nothing is found using the user's parameters, it will automatically try with different starting values. The last 3 parameters are easy to set, either by observation or by mining the literature. The most critical parameter is the threshold. Fortunately, it's not very critical.
I am going to show how tolerant my method can be. The previous picture shows the optimal threshold. [Click on the thumbnails to see the full-size pictures]. When I start from this value of threshold (1620), no cross-peak is lost and no false positive is created (spectrum not shown).
In the next picture the threshold has been reduced to less than the half, quite far from the optimal value. The threshold is now 620.

My method yields this:

The only difference between this and the spectrum processed with the optimal threshold, is the additional presence of the two cross-peaks labeled 22->20. This is good to know, because it means that the threshold is not critical. You can enter a wrong value and the result will almost be the same. This bond C20-C22 actually exists, but the cross-peaks on the left falls at the wrong frequency. It makes you think that C20 is bonded with C10 instead. As you can see, the column of C-10 now contains 5 cross-peaks (5 bonds??).
C-22 falls at 36.2 ppm. C-10 falls at 36.5. C-20 falls at 35.8. It is difficult to demonstrate if two atoms are bonded when their chemical shifts are so near.
I am very glad to see the cross peaks 5-10, 5-4, 3-4, 3-2 because they deviate from the simple doublet model. A simplistic model would never recognize these peaks, but my filter assumes a more generic model.
If I raise the threshold to 2620, the following cross-peaks disappear: 5-10, 1-10, 14-13 and 25-27. They are not so many, yet the basic lesson is: it is better to underestimate the threshold than to overestimate it.
I need more examples to test this method against. I feel it is promising indeed. The algorithm is simple and fast, which are always good qualities. It is amazing how far you can go with very little math.