Organic reaction database IV
|Still missing from our database was an organised way to present reaction types. Luckily the RSC has a scheme in place to do just that, called RXNO. They even offer an XML file for download with info on all +300 reactions in the inventory. Never mind that the xml file is a clumsy rdf-type, this saves us a lot of effort. From now on, each reaction characterised by an reaction type id and a wikipedia link. In the example below a Horner-Wadsworth-Emmons reaction.
Organic reaction database III
24 August 2012 - The Chemical Reaction Database
|With market research and technical specs out of the way it is time for episode one (beta) of our novel chemical reaction database (CRB). We ambitiously or naively start with reaction 00000001, selected at random from the recent literature. As a reminder the images are displayed as .svg. Outdated browsers will not be able to handle this image type. The Pubchem sketcher is doing a lot of the hard work, generating the image, the InchI and SMILES in a single sesson. The entire database (2 main records!) is available as a download below. This makes the database the first open-source reaction database in the world!, ever! We feel proud.
|reactiondatabase_24082012.pdf beschikbaar als download |
The chemical reaction database II
17 August 2012 - Cheminformatics
|Having completed our market research, what should a chemical reaction database look like? If you take organic reactions a lot of data can be associated with it: reactants, products, catalysts, temperature, solvent etc. How much information would you like to retain?. |
Some novel concepts will make life easier. The DOI system makes accessing the article a piece of cake, in a few year's time we will also have an ORCID to manage the authors.
The structure of any molecule can be described by SMILES or inChi code. This will greatly aid substructure searching. Of course people in the organics department are accustomed to absorb organic reaction information in the form of cartoons: a picture of two molecules on the left and then their reaction products on the right. But keeping a large library of images on file is cumbersome. Existing reaction databases also rely on .gif images which lack rendering quality. Databases are cheap these days and cheap to run so store the image in the database itself. And instead of .gif why not store the image as an .svg? Firefox is trained to display svg files as images and added bonus, svg files can be manipulated as any xml file.
Clearly the law of the handicap of a head start applies, by the way (Wikipedia fact alert!) a 1937 invention by a Dutch historian. Compared to starting a database like SciFinder, starting a new database has also zero investment costs, easier to maintain and less data to store.
There are still problems though. Nature Chemistry adds a full list of molecules with each publication together with a CML or molfile file. Converting these file types to svg should be a trivial task. But where are the CML2SVG or even inChi2SVG scripts? For some reason they have yet to be invented.
The remaining issue to solve then is the actual database structure for our chemical reaction database (organic database really). The NNNS computer lab has been doing some research and came up with an old-fashioned relational MySQL database with three tables, one describing each molecule involved, one describing each reaction with respect to DOI and reaction conditions and one describing each molecule involved in the reaction together with the role it plays (product, reaction, solvent, catalyst, ligand etc.) and it's molar ratio. In this way you can throw any number of reactants in the equation together with any multi-solvent or multiligand system.
Yes, that should definitely work.
Next episode: the quest for a venture capitalist.
|reactiondatabase.pdf beschikbaar als download |
16 August 2012 - EnvironMENTAListics
|We've got him! The first of no-doubt many official reports on the 2012 Big Utrecht Asbestos Scare (BUAS) (pdf). See the earlier blog report here. The first surprise is no surprise at all: the asbestos levels in the affected apartments were insignificant. The asbestos varieties that have been found are amosite and chrysotile, An earlier report report mentions X-ray microanalysis and SEM as analytic tools (measurements in fibers per square meter). The Dutch norm is NEN 2991 but for some reason it costs 30 euro's to read it. Sorry! The international norm is ISO 14966 and also costs money. The closest public source if you are interested can be found here.|
The presence of asbestos variety amosite did come as a surprise because according to the report it is never used in building homes. It was used in a type of sealant for a part of the roof. The report does not have anything to offer in terms of an explanation but in a radio interview one of the authors speculated that at one point during construction of the building (1960) a worker was looking for sealant and stumbled upon this particular amosite containing batch. Thanks a lot, worker! 52 years down the road you caused a mass-panic! Well, it goes to prove that accidents more often than not are freak accidents.
The chemical reaction database
05 August 2012 - Cheminformatics
|Some time ago Derek was kind enough to present a list of public domain chemical databases here, but what about public domain chemical reaction databases?. The inventory.|
First up is webreactions, a searchable organic reaction database maintained by openmolecules.org, a group of chemical software developers. A Basic Ullmann-condensation - type reaction delivers (java applet) a fantastic 22 records. Many of them from a database called Chemsynth. This database is maintained by InfoChem GmbH part of chemReact which is part of SPRESI which appears to be started from a 1970's USSR relic. Only considers yields over 50%, so not negative results. much bibliographic info, of course redundant in 2012 with the introduction of doi's. There is a MDL number and and a RXNTYPE (info)
Spresi.com itself is linked to Springer and offers a one-week trial so we at least can have a look. The Ullmann result is 844! Additional search parameters apart from substructure search are yield range and conditions range (+800 different toluene conditions?). Peculiar: temperature, solvent and catalysts are lumped together and not addressable individually.
Back to open-source, the venerable orgsyn has a free-of-charge Chemdraw plugin for a substructure search. Searches frequently end in MolServer error 800a03ec C:\Inetpub\wwwroot\ChemOffice\CFWTEMP\OrgSyn\orgsynTemp\Sessiondir\640967013\ReactionsStructure.cdx so clearly something goes wrong (Go for Linux NOT Windows!).
Chemspider associated with the Royal Society of Chemistry also maintains a chemical reaction database. It can be searched by author, reaction type or compound. Searching for Ullmann reactions, phenol gets a hit but a substructure search on a phenol and a arylbromide results in nothing
Extending the search to the outer reaches of internet the University of Auckland promises a reaction database here but after the registration process is done it appears you have enrolled as a student. Help? Fees?. Reaccs maintained by MDL Information Systems offers an intriguing black screen, but how does it work?
And then there are the big commercial players: of course there is scifinder maintained by the American Chemical Society. Do they still charge 20 dollars per query?. If Google would get away with that sort of revenues they would own the planet by now and not just the US. Reaxys is owned by the evil empire Elsevier which we boycott.
Sord.nl offers more chemical database content based on so- called "lost chemistry" based on dissertations etc. Does it include otherwise brilliant research by people forced out of a PhD by incompetent University managers?
Late entries: Chematica @ The Guardian