One of the biggest problems confronting science right now is how we deal with the floods of data (not least genomics data), and in particular how we visualise them. En route to the Science Foo camp (SciFoo) held at the Googleplex over last weekend, I was privileged to be shown, by its curator Bonnie DeVarco and her collaborator Eileen Clegg, a wonderful exhibition of scientific visualisations (data visualisations) at MediaX at Stanford. It is also online.
Continue reading: Scientific data visualisation and #SciFoo09
The introduction to most scientific papers will probably contain something along the lines of “It is widely accepted that….”, followed by the citation of a few more or less recent reviews of the topic. Last week’s blog noted the frequency of mis-citation, and this leads, surprisingly naturally, into asking the question ‘which reviews or papers might one then cite to bolster a view of present-day knowledge on a subject, and on what basis are these chosen?’ A partial linkage between these two issues (mis-citation and choice of material to cite) comes via what Merton (1968) (with a follow-up in 1988) called the Matthew Effect, on the basis of the lines in Matthew’s Gospel (25:29) that read “For unto every one that hath shall be given, and he shall have abundance: but from him that hath not shall be taken away even that which he hath”.
Continue reading: The Matthew effect in Science – citing the most cited
Much has been written about the extent of the contribution that is expected of the author of a scientific paper, and I shall not add to it here, since the initial focus is on identifying the people who have been awarded (apparent) coauthorship of an article. Last week I wrote about text mining, which might be one general approach by which to find out. As with most activities (including those involving machine learning), this is not without its hazards, specifically that involving the unique identification of authors. The accurate identification of authors of scientific publications is one of the most important issues facing us as we begin to develop digital analyses of citation networks, coauthorship networks, scientific productivity, bibliometrics for purposes of the Research Excellence Framework (REF), and the like.
Continue reading: What’s in a name? Guest, ghost and indeed quite imaginary authorships
The availability of many records in digital format opens up many possibilities, not least in bibliometrics, a subject that I anticipate will be a regular feature of these blogs. For this blog we are going to look briefly at the distribution of scientific activity between individuals, as encapsulated by the question ‘if n individuals have published 1 scientific paper in a particular time period, how many individuals have published 2 papers or 10 papers or 100 papers?’
Now one might wonder whether one should expect there to be any regularities in such a (quantised) distribution, but there are. The question was posed and answered most pertinently by Alfred Lotka in 1926, and the relationship is known as Lotka’s Law. Lotka observed, from a study of papers listed in Chemical Abstracts and in Auerbach’s Geschichtstafeln der Physik, that the number of persons making n contributions is given by 1/na of those making a single contribution, with a equalling approximately 2. Thus for every 100 people who have published 1 paper, 25 have published 2 papers and 1 person has published 10 papers. In other words, the distribution of scientific productivity is best described by an inverse square law (a specific version of a negative exponential more generally referred to as a Zipf distribution). Although this is not universally true, it is a reasonable approximation and has some interesting mechanistic bases. The consequences, as recognised in Lotka’s original survey, included the fact that 60% of contributions were made by authors who contributed only one paper (and note that all joint papers were taken to have been written by the ‘senior’ author only). Nowadays this would be seen as a long-tail phenomenon, as popularised in Chris Anderson’s excellent book.
Continue reading: On distributions of scientific activity and productivity