The availability of many records in digital format opens up many possibilities, not least in bibliometrics, a subject that I anticipate will be a regular feature of these blogs. For this blog we are going to look briefly at the distribution of scientific activity between individuals, as encapsulated by the question ‘if n individuals have published 1 scientific paper in a particular time period, how many individuals have published 2 papers or 10 papers or 100 papers?’

Now one might wonder whether one should expect there to be any regularities in such a (quantised) distribution, but there are. The question was posed and answered most pertinently by Alfred Lotka in 1926, and the relationship is known as Lotka’s Law. Lotka observed, from a study of papers listed in Chemical Abstracts and in Auerbach’s Geschichtstafeln der Physik, that the number of persons making n contributions is  given by 1/na of those making a single contribution, with a equalling approximately 2. Thus for every 100 people who have published 1 paper, 25 have published 2 papers and 1 person has published 10 papers. In other words, the distribution of scientific productivity is best described by an inverse square law (a specific version of a negative exponential more generally referred to as a Zipf distribution). Although this is not universally true, it is a reasonable approximation and has some interesting mechanistic bases. The consequences, as recognised in Lotka’s original survey, included the fact that 60% of contributions were made by authors who contributed only one paper (and note that all joint papers were taken to have been written by the ‘senior’ author only). Nowadays this would be seen as a long-tail phenomenon, as popularised in Chris Anderson’s excellent book.

A similar empirical Law, known as Bradford’s Law, describes the pattern first noted by Samuel C. Bradford in 1934 (see 1985 reprinting) that “estimates the exponentially diminishing returns of extending a search for references in science journals. One formulation is that if journals in a field are sorted by number of articles into three groups, each with about one-third of all articles, then the number of journals in each group will be proportional to 1:n:n² [see Wikipedia].” Put another way, while many of the papers in a scientific field (however defined) may well be published in a set of m core journals, about two thirds of pertinent ones will be much more widely distributed, over m*(n+n2). Although usually interpreted in a very different way (“most of the literature in a field is in a small set of core journals”), this too is in fact better seen as another long-tail phenomenon since the latter third are in fact extremely widespread. Indeed this focus on ‘core journals’ in a field (often in the context of library holdings) is seen as “tending to favour dominant theories and views while suppressing views other than the mainstream at a given time”. It is another example of the balkanisation of the literature, to which I have recently alluded in two open access papers. The opening gambit of the manuscript version of the latter paper read “Most scientists now manage the bulk of their information electronically, organizing their publications and citations using digital libraries”. The most perceptive of its referees began by responding “neither I, nor most people I know, use such systems” (and we modified the remark). BBSRC has long been committed to the development of tools (including e-tools) that will help biologists, including through our Tools and Resources Strategy Panel and one of our new Committees. It is to be hoped that the increased development and exploitation of electronic tools to help deal with the flood of words and data will assist our community in increasing yet further both its adventure and its productivity.

Related posts (based on tags and chronology):