The chemical reaction database II

17 August 2012 - Cheminformatics

Having completed our market research, what should a chemical reaction database look like? If you take organic reactions a lot of data can be associated with it: reactants, products, catalysts, temperature, solvent etc. How much information would you like to retain?.
Some novel concepts will make life easier. The DOI system makes accessing the article a piece of cake, in a few year's time we will also have an ORCID to manage the authors.

The structure of any molecule can be described by SMILES or inChi code. This will greatly aid substructure searching. Of course people in the organics department are accustomed to absorb organic reaction information in the form of cartoons: a picture of two molecules on the left and then their reaction products on the right. But keeping a large library of images on file is cumbersome. Existing reaction databases also rely on .gif images which lack rendering quality. Databases are cheap these days and cheap to run so store the image in the database itself. And instead of .gif why not store the image as an .svg? Firefox is trained to display svg files as images and added bonus, svg files can be manipulated as any xml file.

Clearly the law of the handicap of a head start applies, by the way (Wikipedia fact alert!) a 1937 invention by a Dutch historian. Compared to starting a database like SciFinder, starting a new database has also zero investment costs, easier to maintain and less data to store.

There are still problems though. Nature Chemistry adds a full list of molecules with each publication together with a CML or molfile file. Converting these file types to svg should be a trivial task. But where are the CML2SVG or even inChi2SVG scripts? For some reason they have yet to be invented.

The remaining issue to solve then is the actual database structure for our chemical reaction database (organic database really). The NNNS computer lab has been doing some research and came up with an old-fashioned relational MySQL database with three tables, one describing each molecule involved, one describing each reaction with respect to DOI and reaction conditions and one describing each molecule involved in the reaction together with the role it plays (product, reaction, solvent, catalyst, ligand etc.) and it's molar ratio. In this way you can throw any number of reactants in the equation together with any multi-solvent or multiligand system.

Yes, that should definitely work.

Next episode: the quest for a venture capitalist.