## Tuesday, June 24, 2008

### Factorize!

Yesterday I introduced the subject of curve-fitting. Today we'll examine the practical details.
A computer can do breathtaking things, like finding out 10 parameters in less than 1 seconds. If you rely too much on it, however, you will be often disappointed. Treat it like a thoroughbred and the computer will serve you faithfully. The first principle is to split any problem in parts. Although you can build a model that accounts for phase and baseline distortions, that's a wrong strategy. Spend all the time required for professional phase- and baseline- corrections. Then start assuming that your spectrum only contains lorentzian peaks.
You also have the precious weapon of gaussian lineshapes. They are not a natural shape into NMR spectra, therefore use them as little as possible. There are programs able to search for the optimal mix of functions. You can try this option too (when noise is moderate). Normally the result will be > 90% Lorentzian. When it happens, stick to purely Lorentzian shapes.
Before the game begins, you choose the battlefield, that is the portion(s) of the spectrum to fit. The number of experimental points must be higher than the number of parameters to fit. This can be a problem in 2D/3D spectroscopy. In the 1D case it's rarely a problem. Pick only the central part of the peak, that emerges from noise.
If the peaks nearby are weak, in comparison, don't include them or their tails into the battlefield. If you include them you also have to fit them. (Remember our motto: "Factorize!"). Into the picture I have highlighted the portion of the spectrum I'd sample to fit the central doublet. It's true that a Lorentzian curve has long tails, but it's not necessary to select them. If the magnetic field is not homogeneous, the tails are actually deleterious, because they deviate from the theoretical shape.
The fitting process doesn't start from vacuum. You have to declare the number of peaks and their tentative parameters. The computer itself can do it for you, but that's the part that humans do better. The frequencies are easy to guess, both for you and the computer. If you set them with care, you can be more vague with the remaining parameters. Anyway, the two operations (sensible guess and least-squares fit) can be repeated and mixed at will. After a fitting run, you can modify some parameters and try again. In the process, you can decide that a parameter can't be further optimized (often the frequency) and freeze the corresponding value. If the value is correct, there will be more chances of success (if it's not correct... more chances of failure).
The least-squares algorithm converges into a local minimum, often but not necessarily the nearest one. A systematic grid search could probably find the absolute minimum, but it doesn't look like a sensible strategy. Most spectroscopists extract the NMR parameters without even knowing the word deconvolution, therefore you should be able to furnish a starting guess that's good enough and whose nearest minimum is also the global minimum. Everything can be done within the GUI: the definition of this starting guess is a graphic process, as simple as drawing a formula with ChemDraw.
You may think that with a suitable linear combination of many lorentzian and gaussian curves you can fit any mountain. That's almost true, but the computer might disagree. For example, if the experimental spectrum to fit only contains a single asymmetric peak and the starting guess is a doublet, during the fitting process the intensity of one of the model curves might go to zero. It means that the computer has decided that the peak is unique and nothing can change its mind. This is not a rule, you can often simulate a shoulder by declaring a small additional peak. If the program tends to cancel it, you can fix the parameters of the shoulder (set them manually and remove from the fitting). It is equivalent to subtracting the shoulder from the experimental peak and performing the deconvolution on the difference spectrum.
You may be tempted to define relations between the parameters. A trivial example is to keep equal the intensities of the two components of a doublet. Don't forget the roof effect or, in more general cases, quantum-mechanical effects. In such a case, instead of fitting the spectrum against a set of curves, fit it directly against a whole spin system. The problem is akin, but more specialized tools are implemented. I'll touch them sometime in future, but there is still another couple of things to say about curve-fitting. Tomorrow.