That genetic differences account for a substantial part of biological variability is hardly in dispute, and the inclusion of genetics (and increasingly molecular genetics) was arguably the key contribution that created neo-Darwinism and led to the ‘modern synthesis’ of evolutionary theory.

Selective breeding programmes amply illustrate the contribution to the phenotype that can be effected by genetic variation. Thus in a very nice summary, Hill (2005) describes an experiment at the University of Illinois (Laurie et al. 2004) that has been running since 1896. In this experiment, scientists have selected (and bred) strains of maize (corn) that are either high or low in the content of oil in their kernels (a trait of considerable agronomic importance). Over the years, the initial 5% oil has changed to 20% in the high-oil lines, and has decreased almost to zero in the low-oil strains. (A similar experiment using protein as the trait of interest gave a similar result, save that the ‘low-protein’ lines retain about 5% of protein.) Genetic analysis (of the quantitative trait loci) showed that a great many parts of the genome contributed this variation in oil content, that the largest could account for a difference of only 0.3% in oil content, and most accounted for just 0.1 – 0.2%. Given this, it is possibly unsurprising that these (small) effects were seen as additive (i.e. independent); put another way, there was negligible epistasis observed in these populations in which all other genes were also segregating.

This general requirement for multiple changes in metabolic pathways to produce significant phenotypic effects is also expected on theoretical grounds (for metabolic networks an early formalism setting this out is known as ‘metabolic control analysis’ – see e.g.  Fell, 1996 or a web tutorial). One might also comment that in directed molecular enzyme evolution, it is also often found that multiple amino acid changes have occurred in order to effect a large increase in activity – e.g. 17 were recorded in the case of an evolved aspartate aminotransferase, and most were distant from the enzyme’s active site (Oue et al., 2005).

In the above cases, the genotype-phenotype relationship (Kell, 2002) was analysed without any reference to changes in the entity’s “environment”, and clearly this is inappropriate in the analysis of many classes of biological problem. A particularly interesting example (Maher, 2008) concerns the search for associations between genotype and phenotype in human populations. Maher notes that for a (complex) trait such as height, 80-90% is considered to be heritable, and that “if 29 centimetres separate the tallest 5% of a [presumably interbreeding] population from the shortest, then genetics would account for as many as 27 of them”. Unfortunately for this cosy picture, several very large genome-wide association studies (GWAS) (see also a recent Nature Insight) have turned up some 40 loci that contribute measurably to the variation in height, but together they account for only 5% of height’s heritability. Where, then, is the missing heritability referred to in the Maher paper’s title (and that of this blog)?

Maher offers 6 possible answers, each of which may contribute to some degree. The first is the recognition that GWAS based on single-nucleotide polymorphisms (SNPs) may be a blunt tool, since – see the molecular evolution comment above – individual SNPs may be ‘diluted out’ (or indeed amplified) by other variations within the same gene, and certainly there is functional linkage between different parts of a protein that may be uncovered by sequence-gazing (Pritchard & Dufton, 2000). Next-generation experimental sequencing methods with their higher throughput that allow full sequencing of genes, and not just of SNPs, may be expected to uncover the contribution of this phenomenon in GWAS.

A second class of explanation is related to ‘low penetrance’ variants, in which potentially 1000s of genes are recognised as contributing to the variance in height. In one sense this must be partly true, since it is unlikely that anything, including a genetic change in one of 25,000 genes, has absolutely no effect, even if it may be (statistically) immeasurably small in practical studies. Evidence for such contributions might also emerge from increased full-genome sequencing (e.g. in the ‘1000 genomes’ project).

One finding that is starting to emerge as more genomes are sequenced, is the unexpected significance of copy number variation (CNV). This underpins a third explanation, since copy number variants are typically not picked up by GWAS that rely only on detecting SNPs.

A fourth explanation is based on epistasis (or synergy). Gene A might affect height by 1 cm, and gene B the same, but together they might add 5 cm. Synergy is very common in biological networks (I cited another example in a recent blog), and I am likely to return to this issue frequently in these blogs, as this kind of recognition is a key feature of Systems Biology. Although I stated above that the maize oil kernel study found little evidence for epistasis, GWAS as commonly performed (i.e. not using inbred lines with lowered background variance) actually cope poorly with epistasis. A further implicit assumption is that individual genes lead to individual effects on the phenotype, but in fact – as shown most clearly in yeast – multiple genes can contribute to particular effects (Brem & Kruglyak, 2005) and variation of the activity of pretty well any individual genes can cause multiple effects (Featherstone & Broadie, 2002) – pleiotropy on a large scale.

Although the total changes in heritability are considered reliable, and are said to have been controlled by comparing genetically identical twins raised together with those raised in different environments, GWAS are not yet designed in a way that they ‘know’ any of the real details of the environments in question. At all events, heritability changes do not have to occur solely by classical genetic means, and the various epigenetic mechanisms may contribute substantially to the missing heritability that again would not be observed by performing standard GWAS. It is certainly the case that we have underestimated the likely significance of epigenetics, and various initiatives such as the NIH Roadmap Initiative on epigenomics in the USA are seeking to change that.

What if the estimates of the phenotype too were wrong? (This is Maher’s sixth mechanism.) This seems not to be such an issue, since while it may be the case that ‘diseases’ in gene-disease association studies are not homogeneous and individual syndromes said to represent a ‘disease’ may hide a multiplicity of subclasses, things like height are simple one-dimensional properties.

Clearly a number of the above explanations may be contributing, in varying degrees, to the ‘missing heritability’. However, I suspect that – as well as in the highly nonlinear behaviour of biochemical networks – the most likely class of explanation lies elsewhere. It is that we have simply not taken properly into account the different environments (social, cultural and lifestyle) to which people have been exposed (or choose to expose themselves) in accounting for the phenotypic variation observed. The diabetes and obesity (‘diabesity’) epidemic emerging in many ‘developed’ countries provides a clear example here, since (Roberts & Barnard, 2005) “As previously pointed out by Booth et al. (2000), 100% of the increase in the prevalence of Type 2 diabetes and obesity in the United States during the latter half of the 20th century must be attributed to a changing environment interacting with genes, because 0% of the human genome has changed during this time period.” Changes in an individual’s diet and the amount of physical exercise taken are arguably the two likeliest lifestyle contributors to this particular ‘changing environment’; the former is a particular focus of BBSRC’s Diet and Health Research Industry Club (DRINC) research programme.