Professor Douglas Kell's blog: news from our Chief Executive

Tag: machine learning

As so often, last week’s activities seemed to cluster on a subset of activities. We had a very useful third meeting of the Industrial Biotechnology (IB) Leadership Forum (and see its latest Newsletter), with updates on a wide range of topics. These included the Centre for Process Innovation at Wilton, home of the National Industrial Biotechnology Facility (see also the BIS Manufacturing and Materials Newsletter), activities at the European Forum for Industrial Biotechnology  meeting in Edinburgh (and see video) and others that week, Industrial Biotechnology as a core part of the Knowledge-Based BioEconomy (a term that had actually originated under the UK and German EU Presidencies around 2005/6), the development of necessary skills (led by David Brown of the IChemE), and a presentation by Iain Wilcock of Seventure on the (venture capital) funding of Industrial Biotechnology. I outlined where we were with our own developments, while Jonathon Porritt, representing Forum for the Future, led a discussion of the possible (perceived and real) risks of introducing new technologies such as IB, and how we should develop relevant public engagement activities.
Continue reading: Industrial biotechnology, biochemistry and risk analysis

Today – May 7, 2009 – is the 50th Anniversary of C.P. Snow’s famous Rede Lecture published as The Two Cultures. In this, he lamented the essential lack of even a rudimentary knowledge of the natural sciences (and technology) among those trained in the arts and humanities – but the expectation by the latter that scientists should themselves be ‘cultured’ by having a rather detailed knowledge of artistic matters sensu lato. He further considered that many of the failures he then perceived in political and public life as the UK developed technologically post-war were due to exactly this kind of ignorance. His ‘test’ for scientific knowledge was whether a person might know about the Second Law of Thermodynamics, a topic also treated more lightheartedly by Flanders and Swann, but later (see e.g. the edition reprinted in 1998) he modified this (in the light of experience) to state that most non-scientists could not even describe properly the meaning of acceleration.
Continue reading: Two cultures?…or two hundred?

Much has been written about the extent of the contribution that is expected of the author of a scientific paper, and I shall not add to it here, since the initial focus is on identifying the people who have been awarded (apparent) coauthorship of an article. Last week I wrote about text mining, which might be one general approach by which to find out. As with most activities (including those involving machine learning), this is not without its hazards, specifically that involving the unique identification of authors. The accurate identification of authors of scientific publications is one of the most important issues facing us as we begin to develop digital analyses of citation networks, coauthorship networks, scientific productivity, bibliometrics for purposes of the Research Excellence Framework (REF), and the like.
Continue reading: What’s in a name? Guest, ghost and indeed quite imaginary authorships

Apart from the use of microbes in metal ore extraction, only in one area has ‘mining’ had much effect on modern cellular biology and that is the area of data mining. Data mining describes a suite of methods combining the intelligent storage, analysis and recognition of patterns in large data sets, for the purposes of turning data into information and to knowledge. Data mining typically makes use of the methods of multivariate statistics and of machine learning to find these hidden patterns, that might then be exploited for intellectual or commercial benefit. It is worth noting the difference between statistics and machine learning. As summarised by Breiman, statistics starts with a hypothesis and assesses the goodness of fit of available data to that hypothesis; by contrast machine learning starts with the data and finds the hypothesis that best fits the data, using methods of cross-validation to avoid over-fitting – which is otherwise a problem. This general distinction is similar to the inductive-deductive distinction in the philosophy of scientific reasoning.
Continue reading: The miners strike again – but these are text miners…

Bee and frog numbers are in decline, and we need to know why. Thus, understanding the dynamics of various species – the study of population biology and ecology – is an important component of BBSRC science, especially where this impacts agriculture. This kind of problem is in fact a classic subclass of problem common in systems biology, where many components may interact, we have little knowledge of the parameters or even the network topology of the system, and where the best we can usually do is to measure system variables. Since it is the parameters of the system that control and determine the time evolution of the (dependent) variables, how can we make progress? The answer is by using inferencing methods (including the methods of data mining and machine learning) that permit one to infer the structure and parameters simply from the measurement of such variables. This is then a data-driven or hypothesis-generating strategy (Kell & Oliver, 2004). The results of the hypothesis-generation step are then hypotheses that can be tested by making the most important inferred parameters independent variables in a subsequent experiment.
Continue reading: Frogs, bees, parasites and stress – data driven analysis of species decline and biological dynamics