ACD/Labs' Poster Schedule at ENC 2006
Pacific Grove, CA, USA
April 23 - 28, 2006
ACD/Labs is offering two seminars at ENC: view the agenda and register for our Industrial or Academia seminar.
| |
| Title: | Automated Evaluation of a Chemical Structure with only 1D 1H and 2D 1H-13C HSQC |
| Authors: | Sergey S. Golotvin, Eugene Vodopianov, Rostislav Pol, Brent A. Lefebvre, and Antony J. Williams (ACD/Labs), and Timothy D. Spitzer (GlaxoSmithKline, Inc.) |
| Date: | Thursday, April 27, 2006 @ 9:00 AM - 6:00 PM |
| Abstract #: | 656 |
| Abstract: | As NMR instrumentation continues to advance, 2D experiments that once took hours are now achievable in minutes. With the standard 1H-13C HSQC experiment now possible in a matter of 8 to 10 minutes, this experimental data set is now useful when a high-throughput evaluation of a proposed chemical structure is necessary. With the additional information that this experiment can provide, much benefit can be gained in the automated evaluation of a proposed chemical structure.
In this work we present a new method of automatic structure validation based on the comparison of calculated and experimental data that is available in a 1D 1H NMR spectrum and a 2D 1H-13C HSQC. Following the approach developed in our previous work on 1D 1H spectra only [1], a comparison is made by means of assignment of the spectral signals calculated for a proposed structure to those which are observed in the experimental NMR spectrum. The peaks in the 2D NMR spectrum greatly increase the accuracy of this assignment process and therefore the structure validation, because they contain both the 1H and 13C chemical shifts of the connected nuclei. The advantages also extend into the ability of the 2D spectrum to improve the analysis of overlapping multiplets in the 1D spectrum and identify the protons attached to heteroatoms in the 1D spectrum, since these do not show up at all in the 2D spectrum.
All of these factors combine to produce an automated system that can greatly outperform a system where 1D 1H information alone is used. Using dozens of real-life spectrum sets, it was possible to unambiguously identify no less than 90% of the correct structures. As part of this test, incorrect structures were also matched with each spectrum set. In this case, the structures were mainly regioisomers of the correct structures, so as to offer a challenging test of the specificity of the system. For these incorrect structures the false positive rate was observed as low as 1%.
[1] Automated structure verification based on 1H NMR prediction, Golotvin SS, Vodopianov E, Lefebvre BA, Williams AJ, Spitzer TD. Magn. Reson. Chem. 2006; (In Press).
|
|
| Title: | The Effect of Structure Description Schemes on Chemical Shift Prediction by Incremental and Neural Network Approaches |
| Authors: | Yegor D. Smurnyy, Kirill A. Blinov, Brent A. Lefebvre, Antony J. Williams (ACD/Labs) |
| Date: | Thursday, April 27, 2006 @ 9:00 AM - 6:00 PM |
| Abstract #: | 655 |
| Abstract: | Typically, a chemical shift prediction algorithm has two major components: i) rules to encode a chemical structure into a set of numbers ("structural code"), and ii) the routine to calculate a chemical shift value from a numerical input. In the current study, we compare multiple algorithms with a special emphasis on the effect of the chemical structure encoding routine on the overall 13C chemical shift prediction accuracy.
Two primary methods were examined in this work: a neural network approach and an incremental scheme (rules based approach). The former implies the use of a network of artificial neurons, each of which takes an input signal (either from outside or from a peer neuron) and, after a non-linear transformation, produces an output. In this study, we employ a multilayer network in which the neurons of the i-th layer receive all of the (i-1)-th layer outputs as inputs. The weights of the net are adjusted by a backpropagation algorithm. In the incremental scheme, the result is shown to vary linearly with the quantity of characteristic moieties present in a molecule. Coefficients for this method are calculated by a partial least squares regression routine.
For both of these methods, the typical experimental workflow was the following: the whole database (more than 2 million chemical shifts) is split into smaller parts according to the central atom type (in this work, we found 6 atom types to be ideal). About 5-7% of the shifts are included into the "test set" and are not used for system training. These data serve to evaluate the overall performance of the algorithm after it has been trained. In the next step, a neural network can be trained or incremental scheme coefficients calculated by regression. Finally, the performance is evaluated on the test set.
The main focus of this work surrounded efforts to optimize several aspects of the whole routine. They were:
- Number of central atom types. Separate neural nets or coefficients sets can be used to calculate chemical shifts of chemically different atoms. In this work, 6 types were found to be ideal. Details of this will be shown.
- The structural code. Several approaches to this problem have been suggested - either encoding individual atoms or parts of a molecule (2-3 atoms). Details of this investigation will also be presented.
- Characteristics of the neural net/regression scheme. The most important result of our work is that we, unlike many authors, have found this to have very little effect on the prediction accuracy. We have designed several types of neural networks (different in transfer function, teaching algorithm, etc.) and found the size of a net to be the only important factor. Typically, 100-300 hidden neurons are a good compromise between precision and speed of computation.
In recent years, a number of chemical shift prediction approaches have been developed, in particular, neural nets have been popular. Most of these approaches focus on sophisticated network architecture or advanced description schemes (for example, 3D conformation). In the study of neural nets and incremental schemes shown here, with the largest quality 13C chemical shift database available, we demonstrate that the network or regression routine is not the key to chemical shift prediction quality. Rather a reliable method to convert a structure to a numerical representation leads to a good prediction with even a simple neural net or regression scheme.
As a result of this work, we find a mean error of less than 2 PPM can be obtained with our approach. This compares well with database-based (HOSE codes) methods and is better than most of the previously reported results of Neural Net approaches.
|
|