On distributions of scientific activity and productivity
The availability of many records in digital format opens up many possibilities, not least in bibliometrics, a subject that I anticipate will be a regular feature of these blogs. For this blog we are going to look briefly at the distribution of scientific activity between individuals, as encapsulated by the question ‘if n individuals have published 1 scientific paper in a particular time period, how many individuals have published 2 papers or 10 papers or 100 papers?’
Now one might wonder whether one should expect there to be any regularities in such a (quantised) distribution, but there are. The question was posed and answered most pertinently by Alfred Lotka in 1926, and the relationship is known as Lotka’s Law. Lotka observed, from a study of papers listed in Chemical Abstracts and in Auerbach’s Geschichtstafeln der Physik, that the number of persons making n contributions is given by 1/na of those making a single contribution, with a equalling approximately 2. Thus for every 100 people who have published 1 paper, 25 have published 2 papers and 1 person has published 10 papers. In other words, the distribution of scientific productivity is best described by an inverse square law (a specific version of a negative exponential more generally referred to as a Zipf distribution). Although this is not universally true, it is a reasonable approximation and has some interesting mechanistic bases. The consequences, as recognised in Lotka’s original survey, included the fact that 60% of contributions were made by authors who contributed only one paper (and note that all joint papers were taken to have been written by the ‘senior’ author only). Nowadays this would be seen as a long-tail phenomenon, as popularised in Chris Anderson’s excellent book.
A similar empirical Law, known as Bradford’s Law, describes the pattern first noted by Samuel C. Bradford in 1934 (see 1985 reprinting) that “estimates the exponentially diminishing returns of extending a search for references in science journals. One formulation is that if journals in a field are sorted by number of articles into three groups, each with about one-third of all articles, then the number of journals in each group will be proportional to 1:n:n² [see Wikipedia].” Put another way, while many of the papers in a scientific field (however defined) may well be published in a set of m core journals, about two thirds of pertinent ones will be much more widely distributed, over m*(n+n2). Although usually interpreted in a very different way (“most of the literature in a field is in a small set of core journals”), this too is in fact better seen as another long-tail phenomenon since the latter third are in fact extremely widespread. Indeed this focus on ‘core journals’ in a field (often in the context of library holdings) is seen as “tending to favour dominant theories and views while suppressing views other than the mainstream at a given time”. It is another example of the balkanisation of the literature, to which I have recently alluded in two open access papers. The opening gambit of the manuscript version of the latter paper read “Most scientists now manage the bulk of their information electronically, organizing their publications and citations using digital libraries”. The most perceptive of its referees began by responding “neither I, nor most people I know, use such systems” (and we modified the remark). BBSRC has long been committed to the development of tools (including e-tools) that will help biologists, including through our Tools and Resources Strategy Panel and one of our new Committees. It is to be hoped that the increased development and exploitation of electronic tools to help deal with the flood of words and data will assist our community in increasing yet further both its adventure and its productivity.
- Anderson, C. M. (2006) The long tail: how endless choice is creating unlimited demand. London, Random House
- Bradford, S. C. (1934) Sources of information on specific subjects. Engineering, 137, 85-86 (reprinted in J. Information Science, 10, 173 – 180 (1985))
- Hjørland, B. & Nicolaisen, J. (2005) Bradford’s law of scattering: Ambiguities in the concept of “subject”. LNCS, 3507, 96-106
- Huber, J. C. (2001) A new method for analyzing scientific productivity. J Am Soc Inf Sci Technol, 52, 1089-1099
- Hull, D., Pettifer, S. R. & Kell, D. B. (2008) Defrosting the digital library: bibliographic tools for the next generation web. PLoS Comput Biol, 4, e1000204. doi:10.1371/journal.pcbi.1000204 (HTML version with tags)
- Kell, D. B. (2008) Iron behaving badly: inappropriate iron chelation as a major contributor to the aetiology of vascular and other progressive inflammatory and degenerative diseases. Preprint: http://arxiv.org/ftp/arxiv/papers/0808/0808.1371.pdf. Peer-reviewed version in press at Genomic Medicine
- Kostoff, R. N. (2002) Overcoming specialization. Bioscience, 52, 937-941
- Lotka, A. J. (1926) The frequency distribution of scientific productivity. J Washington Acad Sci, 16, 317-424
- Nicolaisen, J. & Hjørland, B. (2007) Practical potentials of Bradford’s Law: a critical examination of the received view. J. Documentation 56, 674-692
- Weinberger, D. (2007) Everything is miscellaneous: the power of the new digital disorder. New York, Times Books
Related posts (based on tags and chronology):

Appointments Board, Professors and policy evidence
19 September 2011

Post-purdah: agriculture, biotechnology, data and soils, and grant rounds
17 May 2010

MRC, e-science, ABPI and partnerships
29 April 2013

Energy, the Russell group, Research Advisory Panel and the Research Environment
29 October 2012

Systems medicine, polyomics and funding interdisciplinary science
17 September 2012
You can follow any responses to this entry through the comments RSS feed. You can leave a comment, or trackback from your own site.
2 comments to 'On distributions of scientific activity and productivity'
[...] facing us as we begin to develop digital analyses of citation networks, coauthorship networks, scientific productivity, bibliometrics for purposes of the Research Excellence Framework (REF), and the like. It is not [...]
[...] power law form of the distribution of citation numbers, as in the Laws of Bradford and Lotka that I discussed before. Of course the mindless propagation of errors without checking sources properly is hardly confined [...]
Leave a comment