Quick update from the field of CASP (Computer Assisted Synthesis Planning) by Thakkar / Kogey / Reymond / Engkvist / Bjerrum in a recent ChemRxiv article (DOI). In it they bring artificial intelligence to retrosynthetic analysis, a particular chemical human art thus far obtained "through years of experience and exposure to a variety of both successful and failed chemistry". According to the authors CASP exists to "complement" this process but we all know better.
The new work builds on earlier 2017 work by Segler / Prauss / Waller (DOI) with key elements a 12 million chemical reaction training set and a Monte Carlo tree search, an algorithm otherwise frequently used in game intelligence for example AlphaGo. The algorithm de-constructs a molecule in a tree fashion with a new leaf generating molecular fragments as nodes. A filter step determines if reverse reaction for each deconstruction is feasible based on real-life chemistry. Continue until you end up with a set of commercially available chemicals. The Reaxys database was used for the generation of the transformation rules in the filter step. For the training step 100 million artificial negative reactions were added because negative results in organic chemistry are seldomly reported.
In the new Thakkar work the chemical reaction set is expanded with that of open-source USPTO database (patents). The entire dataset can be broken down into specific chemical transformations. Nice to know there are 137803 different nitro to amino conversions. It is noted that the overlap in data between USPTO and Reaxys is only 7.4%. The authors mention that it does not help that in patents many key reaction data are found inserted in reaction scheme images and can therefore not be machine-extracted. Also new is the inclusion of protection and deprotection reactions. One interesting find was that the open-source USPTO dataset was sufficient to find reaction routes for all the top-125 pharmaceuticals and that under 4 seconds each.
Is there such a thing as retrosynthesis competitions? Yes! The 7th National SCI/RSC Retrosynthesis Competition will be held on 13 March 2020 (link). The terms and conditions do not prohibit AI machines from entering.