Chemoinformatics
(also known as ‘cheminformatics’) applies computational and information science
techniques to a range of chemical problems.
These techniques are used, for example, to model compounds and to
predict their properties.
Chemoinformatics has made it possible to simulate expensive operations
such as high-throughput screening (the rapid testing of numerous compounds for
desirable qualities). A good starting
point for those new to the field is the book ‘An Introduction to
Chemoinformatics’ by Leach and Gillet.
My project is
specifically concerned with the idea of de novo design (Latin for ‘from
new’). De novo design builds novel
molecules in virtual space and evaluates them against some measure of
functionality, such as similarity to a compound known to work well, or the
complementarity between the shape of the molecule and the receptor it is to
fill. The aim is to streamline the drug discovery process. By referring to these virtual compounds, it
may be possible to reduce the number of synthesis operations necessary.
However, the sheer number of potential molecules that can be made renders it
impossible to explore all of chemical space for solutions, even with modern
levels of computing power. As a result, various predictive sampling processes (such
as the Monte Carlo method) have been adapted for use in this field. These can
give good results, but their simplistic approach to constructing molecules can
lead to products being generated that are impossible to fabricate in real world
situations, limiting the overall usefulness of the methods.
In order to work around
such problems, methods have been developed within the chemoinformatics group
that use genuine reaction data from literature to create generic rules (the
reaction vectors). These rules can be
applied to a given starting material to generate new molecules. This provides a
compromise between finding novel molecules and retaining a synthetic awareness,
as each transformation is based on a literature precedent. However, this method
has its own limitations. Some structures
for example, can be built up using multistep reaction sequences. In many cases,
the intermediate steps in such sequences do not resemble either the end product
or the starting material in terms of similarity to the target. Consequently, many potentially useful
molecules are never completed because steps en route to the end product score
poorly.
One solution to this
problem is to create a new rule format that represents all the reaction steps
in one operation. This requires some initial identification of those sequences,
which is where my project comes in. If we regard a chemical reaction as a
transformation of one molecule into another, it becomes possible to connect
these transformations together into a network.
Every path through the network will represent a multistep synthetic
route for which each step has a known example in the literature. Reaction properties such as ease of
synthesis, cost of materials, yield of product etc. can then be added to the
network to bias the selection of routes in accordance with a particular set of
criteria.
No comments:
Post a Comment