It is not news that the online digital availability of increasing amounts of text and data are likely to change the entire epistemology of much of science. We are also well aware that in some areas, not least genomics, the sheer amounts of data are likely to break existing structures, which will provide many opportunities but will also require novel approaches and architectures. To this end, I have been leading a BBSRC delegation to the USA to visit a variety of funders, users and industries to compare thinking so as to ensure that we are well placed to position ourselves to advantage. A full report will be produced in due course, but this blog will give a brief overview of some of the places we visited.

Our first visit was to Alex Szalay at Johns Hopkins, who has built a Graywulf computer for storing and assisting reasoning about very large (Petabyte) datasets. It is at least plausible that this architecture will be useful for genomics data, such as those we shall be generating at TGAC. We then visited a number of funders in DC, including the National Science Foundation, the National Centre for Biotechnology Information, the Department of Energy’s Office of Science, and the Executive Office of Science and Technology of the President. There was a considerable degree of consonance about the importance of these issues regarding hardware, software and training, albeit some funders were more advanced in their thinking than others. I found the GTL knowledgebase of considerable interest as one way forward for attacking the problems of federated systems biology data.

We then visited several health-related Institutes and funders with an interest in informatics, including the National Institute of General Medical Sciences, the Genome Informatics and Computational Biology program at the National Human Genome Research Institute, and the National Cancer Institute Center for Bioinformatics. This latter leads on another groundbreaking initiative designed to link together ‘the entire cancer [research] community’ via the cancer Biomedical Informatics Grid; this may be a useful benchmark for related efforts that are likely to emerge in BBSRC’s space.

We next visited the IBM Almaden Research Center, and a series of groups at UC Berkeley and at UCSD, the latter including a tour of the robotics and informatics laboratories of the Joint Center for Structural Genomics (I note that our Chair is a member of its SAB), the California Institute for Telecommunications and Information Technology, some wonderful immersive visualisation tools, and the Department of Bioengineering, where I gave a seminar. Several of these groups have just published an outstanding paper that brings together structural and network (systems) biology (see also a video).

After the seminar I was pleased to meet Phil Bourne, Founding Editor of PLoS Computational Biology, where we recently published an overview of digital tools for improving the ease with which one exploits the scientific literature. As recently announced, PLoS have now made available access statistics for all their papers (this contrasts with BMC, who provide only the most accessed, e.g. for BMC Medical Genomics), an approach that has considerably more immediacy than does the analysis of citation statistics.

Later today (Sunday) I shall visit a bioprocessing plant, and our tour will end with a visit to Microsoft Research, about which I shall blog another time. Overall, this has been an exceptionally valuable set of visits, that will help considerably in guiding our thinking as we shape our strategy for the intellectual and infrastructural challenges that lie ahead.

