Chemical reaction database update

22 October 2013 - Cheminformatics

There are some new initiatives to report on the topic of open-access chemical reaction databases, all sparked by a single email to the Openbabel discussion list. Here is a quick summary of some of the responses. Chemspider is serious about setting up a database, see slideshare presentation here. Not exactly open-source but at least a free resource (open-access). The database resides here and one test with NC1=C(C=O)C=C(C=C1)Br has 8 (!) immediate hits. Chemaxon lists 241 organic reactions with open-access but no download button here either. From inChi it is a small step to RInChI! The website is here. Describing a chemical reaction in a single string is certainly intriguing. A small sample set is included.

In the meanwhile progress on our own CRD database is slow. The number of reactions is a lousy 149. No matter what level of job automation, it still takes too much time to process just one reaction, the Openbabel discussion has some complaints about patent literature as too difficult. In my experience patents work very well, partly because patents are more careful about identifying chemicals by the correct name. One more initiative worthwhile to mention: OSRA promises that it can scan any pdf document and extract the SMILES code of each and every chemical depicted. Now that would bring complete job automation within reach and make commercial chemical database bosses nervous. I am going to try it out for myself.

Not yet open-access (no website) but at least the raw data are available for free here (.sql file, 4 MB) for the CRD. Reaction number 149 is depicted below.CRD149.PNG