SMASH 2007
September 16, 2007
Le Majestic Centre de Congres
85 Place du Triangle de l'Amitie
BP 25 74400 Chamonix Mont-Blanc, France
ACD/Labs Seminar
Register for ACD/Labs' SMASH Seminar featuring several guest speakers and our own highlights on the latest techniques and enhancements in NMR software.
Among the topics to be presented are:
- The Cruel (Real) World of Automated Structure Verification and the Challenges that Remain to be Solved
- Uncovering the Truths and Dispelling the Myths Behind ACD/Structure Elucidator
- Making Your E-Lab Notebook and ACD/Labs Work for You
Poster Schedule
| Title: | Validating Automated Structure Confirmation in a Blind Study |
| Authors: | Ryan R. Sasaki, Sergey S. Golotvin, Brent A. Lefebvre, and Antony J. Williams (ACD/Labs); Randy D. Rutkowske and Timothy D. Spitzer (GlaxoSmithKline, Inc.) |
| Date: | Sunday, Sept. 16–Wednesday, Sept. 19 |
| Abstract: | The blind test consisted of a data set that was provided ahead of time for adjustment of processing settings (19 spectra sets), and one that was "blind" (10 spectra sets), and therefore not provided to the software or software operators. This allowed the true performance of the process to be evaluated without any bias of the result towards particular data sets by customization of the software settings. The results show that a completely automated system can reduce 85% of the datasets a spectroscopist has to evaluate manually. Presented here will be an analysis of these results, a review of the structures that could not be automatically confirmed, and details of the algorithmic improvements that enabled this level of performance. |
| |
| Title: | NMR Chemical Shift Prediction by Atomic Increment Based Algorithms |
| Authors: | Brent A. Lefebvre, Yegor D. Smurnyy, Kirill A. Blinov, Mikhail E. Elyashberg, and Antony J. Williams (ACD/Labs) |
| Date: | Sunday, Sept. 16–Wednesday, Sept. 19 |
| Abstract: | Here, we would like to report on a new implementation of the Atomic Increments algorithm for NMR chemical shift prediction. Predictions were performed using both linear regression with a partial least squares (PLS) algorithm and neural networks under different conditions for 13C, 1H, 15N, 19F, and 31P nuclei, and the results were compared. The focus of the work was on strategies used to encode a chemical structure into a numerical representation; a key step required by the neural network or linear regression approaches. It was quickly discovered that a careful balance must be found. On the one hand, a detailed numerical description leads to more precise results for structures included in the training set and their structural relatives. On the other hand, however, a very detailed description inhibits the ability of the network to make generalizations, and the predictions for structures outside of the training set are very poor. The most important decision is how many different types of atoms should be distinguished, and how many spheres around the central atom should be taken into account. The best chemical shift predictions for 13C resulted when all of the atoms were divided into approximately 70 different classes according to their chemical nature, stereochemistry, and formal charge; and the information associated with atoms up to 5 bonds away was used to influence the prediction. Interaction between atoms and the effects of this were taken into account by using cross-increments. As it turns out, without cross-increments, neural networks performed better than the regression scheme. Remarkably, with cross-increments added (only atoms separated by one or two chemical bonds were considered), both linear regression and also neural networks performed similarly and much better than without cross-increments included in the description of a structure. As before, when too many cross-increments were introduced, over-fitting and poor predictions resulted. With a training dataset of approximately 2 million individual 13C shifts, the most efficient calculations had mean errors of prediction of 1.8 and 1.6 ppm (an independent test dataset of 150,000 chemical shifts was used) for each of the techniques. We also applied the approaches developed on the 13C database for chemical shift prediction of other nuclei (1H, 15N, 19F, and 31P). The following average deviations (in ppm) were found: 0.18 and 0.17 for 1H, 18 and 16 for 15N, 5.5 and 6.5 for 19F, as well as 9 and 7.5 for 31P for linear regression and neural network, respectively. These results compare well to our standard HOSE code approach to prediction, and show that principles developed for carbon NMR shift prediction seem to be applicable to a wide range of nuclei. Our conclusion is that the most important factor influencing the precision of the chemical shift prediction (and, probably, other structure-activity relationship studies) is the scheme used for encoding a chemical structure into a numerical input. With an appropriate incremental scheme, both the linear regression and neural networks method are highly effective, and suitable for most situations that require chemical shift prediction. |
For complete event information, visit SMASH 2007.
|