Synthesis design by computer

23 October 2020 - AI near you

Mikulak-Klucznik_synthesis_design_2020.PNG Back in 2014 this blog (link) was a little grumpy about Chematica, the company specialized in designing organic reactions based on computer algorithms. A paper from lead scientist B.A. Grzybowski did not disclose much information but hardly surprising as it was a paper about a commercial product. The way the company handled publicity did not sit well either. A brand new paper in the journal Nature ( Mikulak-Klucznik et al. DOI) did initially raise one eyebrow as well. The authors hail from the Polish Institute of Organic Chemistry and Northwestern University but why does Grzybowski have a gmail email address? The Competing Interests section has this to say: the company Chematica has been sold to Merck and the algorithms have found their way into a software product called Synthia. You can find more info here. Grzybowski does not hold stock but is actively involved in future development.

What do we learn from the 2020 paper dealing specifically with natural product synthesis, that was not already in the 2014 paper? The application in now based on 100,000 reaction rules, includes estimates for reaction yields and costs. Initially the software performed poorly with natural product synthesis. The authors write that often a key transformation in a synthetic scheme requires several simple synthetic steps setting the stage. As these steps actually increase complexity, routes containing it are quickly dropped. This problem was solved by finding and bundling these synthetic steps into one transformation. Stereocenters also add complexity, especially if stereoselectivity is controlled by a far-removed other stereocenter. Other hurdles tackled: conflicting functional groups in a synthon, molecule strain (molecular mechanics calculations are included in the process) and aromatic substitutions (handled by electron-density calculations). In terms of complexity, a molecule like paclitaxel will still elude a solution.

Chematica is now confident it can tackle total synthesis. The give an idea of the complexity, the article mentions a target requires several hours of computing time (@ 500 GB RAM). In 2014, a panel of pH.D. students judged the machine, in 2020 the panel consisted of established synthetic experts. A Turing test based on elegance (a vague concept - according to chemists there is an elegant way to assemble an IKEA bookcase and an inelegant way) resulted 42% of the machine solutions deemed human by the humans.

Key bonus of this article is the inclusion of three actual syntheses of complex molecules (Dauricine one of them) with the design machine-generated and the successful execution handled by humans. Additional bonuses: the supplementary info contains one of 100 thousand rules that Chematica relies on, giving a basic idea how this stuff works. Also: the supp.info contains all the synthetic steps WITH the systematic name of reactant and product! Very welcome. How scientists in 2020 get away with just reporting the synthesis of "S2" from "S1" is an enduring mystery.

Some of the statements made still managed to raise the other eyebrow. No, AI does not mimic human reasoning and yes you can state your work is "high quality" but why self-cite this statement? Interesting quote: even with grade A chemical reaction databases "a sizeable fraction of reactions suffer from manual-entry errors". That must hurt at Reaxys and SciFinder.

PS This blog does not endorse any products whatsoever.