A new entry in the field of retrosynthesis prediction: Chinese conglomerate (60k employees) Tencent has put a paper out on ChemRxiv through its Tencent AI Lab in Washington (Chaochao Yan et al. DOI).
Traditionally retrosynthesis is the domain of very experienced organic chemists who deconstruct a target molecule by breaking it up in so-called synthons (fragments if you like), the synthons are again deconstructed until a synthon can be matched with an available chemical compound. A practical laboratory route to a desired chemical compound has now revealed itself. Many computational strategies have evolved over the years (see example CASP here). The Tencent AI lab has now rolled out RetroXpert (Retrosynthesis eXpert) which sticks close to the traditional approach ("prediction like a chemist"). It too has two parts, predicting a reaction center in a target molecule (the relevant bonds in a fragmentation step) with the so-called Edge-enhanced Graph Attention Network (EGAT) by artificial intelligence and prediction the potential reagents in the Reactant Generation Network (RGN).
The training set was based on the USPTO-50K dataset which is a collection of 50k chemical reactions extracted from US patents with each record a SMILES reactant and product. The new methodology has a reported 70% success rate. The discussion mentions that if there is more than one way to synthesize a product it is not possible to evaluate the best route. In my view the missing metric is reaction yield (or even reactants costs). The USPTO dataset doesn't do yields. The article references another research group who published very similar research in January of this year (DOI) but with a lower success rate. The list of affiliated research institutions for the authors of this article is again not typical for a regular chemistry article as it contains Google Brain and fintech company Ant Financial.