Towards automated deconvolution methods

Short description:

So we have this “magic” tool called deconvolution to separate mathematically what couldn’t be separated physically by chromatography. By deconvolution methods I refer here generically to all methods that separate mathematically the signal corresponding to each chemical compound in a separation system where partial coelution occurs. There is an inherent problem if you want to apply deconvolution methods for signal processing in chromatography. It is called automation. Although there has been a lot of development in deconvolution methods for the separation of the signal of compounds that partially coelute, the practical application of these methods shows the problem that the user should take decisions every time deconvolution technique is applied to a peak cluster. This is not a problem if you have a couple of peak clusters to be deconvolved… but what about the (usual) case of having hundreds, or even thousands of peak clusters?. If the software prompts the user for a decision at every peak cluster, the method is obviously not practical.

When going into the process of automated deconvolution, there is normally one critical decision that the user has to make, which is deciding the number of components that are coeluting. This is a key step, which will influence critically the deconvolution results. And this step is difficult to automate.

In our research studies we came across with two different solutions. The first one takes information of what is called autocorrelation. Autocorrelation accounts for the amount of signal at point i that can be explained as a function of the signal found at point i-1. The higher the autocorrelation, the less noisy the signal appears, and the more information may contain. It happens that, when more and more components are added into the deconvolution model, the residuals of the model (i.e. what is left after modelling) change their structure. When the just number of components are included, the residuals show a sudden decrease in their autocorrelation. This sudden decrease has been used to automate the decision of the number of compounds coeluting in HPLC-DAD chromatograms. Please, refer to publication [24a] for more info.

The second approach is of different nature. It is based on a 2-fold validation procedure. Basically, it splits the data in two parts (odd and even points) and uses an interpolation algorithm to predict odd points from even points and vice-versa. As noise tends to be uncorrelated, the interpolation algorithm fails (i.e., cross-validated residuals raise) when noise starts to be modeled, which is an indication that too many components have been added into the model. In this sense, it is a true cross-validation method, as part of the data acts as a validation set whereas the other part acts as calibration set. The roles are then swapped as in any 2-fold cross-validation procedure. I will not go into the details (please, refer to publication [40a] for more information). We applied this procedure to GCxGC-MS signals for food analysis. In some cases, the raise in the residuals is not observed, due probably too high autocorrelation of the noise and/or too complex signals.


This project was developed at the University of Amsterdam. Several people were involved. See authors of the publications for more details about authorship. The work of using cross-validation to assess the number of compounds was part of the thesis of Sonja Peters (my PhD student) [1p].


University of Valencia, University of Amsterdam, Unilever.


See my presentation in San Francisco (HPLC-2005) for the noise autocorrelation method. There is no presentation associated for the cross-validation method.


The software for this methodology is still under construction. We hope to offer software for this soon.