The search for new drugs with desirable properties has been likened by Robin Spencer (as cited) to the process of holing a golf ball from a great distance, in which several strokes (using different clubs) are required to get one ever closer to the target. This is because the search space of possible drugs is simply enormous. The field that deals with answering the question of this number is known as ‘enumeration’, and while not all possible molecules are likely to make drugs, thus introducing constraints, the number of ‘drug-like’ molecules containing up to 30 ‘heavy’ (non-H) atoms has been estimated at 1063.

To face this challenge, pharmaceutical companies have acquired very large libraries (1 – 2 millions) of candidate substances that might produces hits in appropriate assays (screens), and which might then be developed into leads and finally into marketable substances. While many of these library molecules are proprietary, a lot of molecular diversity is already available to those without direct access to new synthetic chemistry. Thus, Williams points out that PubChem already contains data on more than 18M molecules, ZINC lists electronically 4.8M that are commercially available (see our recent analysis of its diversity as part of a separate analysis of the comparison between commercial drugs and intermediary metabolites, related to the role of carriers in cellular drug uptake), and the internet is a seriously useful resource for acquiring such molecules.

Notwithstanding the recognition that an initial hit must be modified somewhat to get to the final product, 1-2 million compounds is a lot to screen, and one may wonder if there is an easier way that requires considerably fewer assays. The method of ‘evolving molecules’ that does this is known as fragment-based lead discovery, and works as follows. Imagine a hypothetical binding site with an affinity of 1 nM for a molecule of the form A-B-C. If there are 100 fragments that could serve to make A, B or C individually, the number of combinations is 100 x 100 x 100 = 106. However, if each of those fragments alone can bind with mM affinity, and we just have to find which 3, we only have to screen the 100 in the first instance. Based on this knowledge we can then search for whether the best answer is A-B-C, B-C-A and so on. Clearly the assembly of chemical fragments differs considerably from that of strings of letters, but the main message is clear: by using multiple steps of evolution (as also demonstrated in a chromatographic context elsewhere), including a selection step at each round, the number of individual assays required to find or evolve a potent molecule is massively reduced. In other words, given a robust assay and the necessary cheminformatics skills, fragment-based approaches make molecular discovery accessible to all laboratory budgets. Nowadays, many laboratories narrow the search space by effecting structural studies at each round, to assess exactly where each fragment binds and thus how to join them up optimally. The total number of individual (but skilfully selected) fragments used is usually in the low 100s. Some recent reviews of this include those by Rees et al., by Erlanson, by Hajduk & Greer, by Hubbard et al., by Jhoti et al.

In some ways, this is the molecular equivalent of the strategy discussed in the previous blog, where I looked at the inhibition of networks at multiple sites. However, in this case the fragment-based approaches are considerably better established. They also provide an unexpected twist on the ideas of evolution, a suitable topic to focus on as we approach Darwin Day.

Related posts (based on tags and chronology):