Buchwald-Hartwig meta analysis

26 October 2020 - Data mining

Fitzner_2020_Buchwald_Hartwig_analysis.PNG This blog has a specific interest in everything to do with reaction databases not in the least because I have been working for some time time now on a reaction database myself. In a recent article (Fitzner et al. DOI) a Swiss research team has been exploiting reaction databases to learn more about one specific reaction type, the Buchwald-Hartwig amination. As a learning exercise you can of course go into a lab and run a couple of reactions and to speed things up you can expand to high-throughput screening but the Swiss skipped the lab altogether and instead selected 60 thousand aminations from several existing databases in order to unearth winning formulations. The results have been compiled into an actual cheatsheet with a top-5 of ligands for several amine classes. Another cheatsheet lists winning ligand / base combinations.

The database was pulled together from Reaxys, SciFinder and the USPTO dataset, incidentally the same set used by Grzybowski in his synthesis design program (blog here). The authors note that some data cleaning was required, reactant structures were missing, the role of each compound in a reactant was not always clear and specifically for the amination reactions it was difficult to separate the ligand from the metal in the catalyst. A third of all reactions were designated as duplicates, which is a unusually high number?

The authors note a limitation in this type of research. The bench chemists that have been reporting these amination reactions over the years, tend to favor a limited set of ligands and bases. Although the number of reactions is large, in the tens of thousands, the diversity is disappointing. The authors recommend that in future experimental work diversity is increased and that for example poorly performing reactions are also reported in research articles.