Setting up the correct algorithms for peak detection in two-dimensional chromatography is of paramount importance. Nowadays, almost all methods developed for chromatography rely on peak detection. Moreover, as the amount of data increases, automation is becoming necessary, and having an (easy to automate) robust method for peak detection becomes even more important.
In this project I’ve been involved in the development of robust and easy-to-automate methods for peak detection in two-dimensional chromatography. In a seminal work [23a], we developed a method for GCxGC peak detection, which is based on a two-step process. In a first step, 1D peaks are detected in the (unfolded) raw signal that comes out of the 2D chromatograph. We used conventional 1D peak detection methods (based on derivatives) [14a] for this objective. In a second step, a decision tree is applied to decide whether two or more 1D peaks should be merged in a 2D peak cluster or not. The decision is based on two aspects, namely (i) unimodality (a 2D peak should show only one maxima) and (ii) agreement of second-dimension retention times (all 1D merged peaks should show around the same second-dimension retention time, within a certain degree of tolerance). The method has been applied to GCxGC and I’ve been extending its application to different areas (including Oil & Gas and Forensics), and later it was applied to HPLCxHPLC [26a] for food analysis, and to protein analysis [25a].
(Peak detection algorithm, steps 1 and 2)
(Peak detection algorithm, step 3)
A second aspect of the project [29a] was to revise the performance of the application of the watershed algorithm (a standard method for peak detection in two-dimensional chromatography). Watershed algorithm is a method “imported” from image analysis, consisting of detecting image edges by simulating a water flood on the data. It happens that if the two-dimensional chromatography data is flipped around, positive peaks (i.e. mountains) turn into negative basins (i.e. valleys). In this way, the watershed algorithm is able to detect “watershed regions” related to each valley (i.e., two-dimensional peaks). In this part of the project we were revising the tolerance of the watershed algorithm to disagreements on second-dimension retention times. We wanted to solve the question of how often does the watershed algorithm fail when the retention times in the second dimension are not exactly coincident (something that always happens with todays’ instrumentation). We found that the tolerance of the watershed algorithm was depending on several parameters (mainly the modulation time and the band broadening in first- and second-dimension peaks).
In general, you can get around 15% probability of failure with this algorithm in well-behaving systems, and this figure can get up to 40% (probability of failure) when the retention times in the second-dimension start to be less repeatable (something quite common in today’s instrumentation). Conclusion: try not to use the watershed algorithm (it fails too often).
This project was developed at different institutions. Several people were involved. Look into presentations co-authorship for more information.
See my presentation at Brugges (HTC-2010).