Helen Edwards, Editor Refer
“I’ve come to believe that learning how to search for and manipulate data will become the next “must-have” skill sets for academic librarians.”
Aaron Tay, Musings about librarianship, March 12 2017.
XKCD Comics, Data. Used in accordance with https://xkcd.com/license.html
Research data support offers new opportunities for librarians to become more active partners in the research process. This is the central argument of The Data Librarian’s Handbook by Robin Rice and John Southall. Much of the data used for and generated by research are now in digital format, and volumes are growing increasingly rapidly. Many funders now require grant applications to be supported by data management plans, and mandate that the data produced in the course of research are made available to other researchers as a research output in their own right. This has led to a data function becoming a necessary part of many research projects – to evaluate, select, purchase, promote, describe and preserve data inputs and outputs, and to advise, train and support data creators and users.
Unlike printed books and journals, data is utterly dependent on how it is managed. The authors comment: “in the case of digital data there is no such thing as benign neglect of the printed era, in which old books could miraculously be rediscovered after many years in dust-ridden attics. Digital information is entirely dependent on a properly functioning hardware and software environment.” This gives a new importance to ensuring that data resources are properly described, and attributed correctly to their creators, that they are linked as evidence to claims based upon them in the scholarly literature, that there is a consistent mechanism for shared access, and that their preservation is managed. This is the work in which librarians can play a central role and one that is complementary to that of researchers.
A key underlying theme of the book is the different perspectives librarians and researchers have on data generated by research. Whereas researchers may see data in terms of their specific research objectives, librarians necessarily focus on the wider context of the broader research and scholarly record. This has several implications:
- Firstly, even if the data fails to support the research hypothesis, description and preservation can still be important. In the field of clinical research, for example, there is already a positivity bias in the literature – negative results are much less likely to be published. Lack of awareness can lead to experiments being needlessly repeated.
- Researchers may not consider other uses to which their data could be put. Rice and Southall comment: “many researchers make the mistake of assuming data they are making available for secondary usage will be used by others for projects with similar objectives and methodologies as their own. This is not always the case as the data may be used by researchers in related disciplines or radically different fields, so any associated metadata that helps secondary users understand how the data was created and coded is going to be important.”
- As experienced data librarians themselves, the authors also identify the strong emotional attachment researchers may feel for their data. This can result in reluctance to let go of their data or to allow others to reuse it. These concerns often manifest themselves as sensitivity or confidentiality issues, or fears of inappropriate use. The role of the data librarian is to negotiate these issues so that potentially valuable data is not unnecessarily destroyed. The secret is to focus on specific issues for which practical solutions such as embargos or light touch anonymisation techniques can be used. The increased emphasis on replication and re-verification of research analysis, and the potential of future use makes preservation of research data increasingly important: “curating data so they remain re-usable into the future allows them to be reanalysed with new techniques or combined with other data collections to create fresh findings. A great deal of data represent information that cannot easily be collected again because of their cost or nature.”
The new focus on data management in the grant making process and the requirements this makes on researchers provides the opportunity for librarians to identify and support unfamiliar activities. An excellent example is the data management plan. In the chapter Data management plans as a calling card, the authors provide several case studies showing how librarians have been able to participate in the creation of grant mandated data management plans and, by doing so, establish close links and increase their credibility with researchers. However, this opportunity is time-sensitive. Once a research group has experience of creating data management plans and templates, and good practice exist, they will naturally become more independent. This also applies to new tools for analysing and manipulating data. The authors comment: “what makes big data challenging is not absolute size or volume, but the necessity to rethink data handling and analysis and then retool.” The challenge for librarians – which the insights in this book very usefully address – is to be able to adapt their support to continuing new developments in the whole research lifecycle.
The authors devote one chapter to Data sharing in the disciplines. This shows both the diversity of data in existence and the different cultural approaches to it. At one extreme, Astronomy is at the forefront of open data sharing and the use of big data. Galaxy Zoo (www.galaxyzoo.org), one of the leading Citizen Science projects, uses millions of images from the Sloan Digital Sky Survey, to invite the public to classify galaxies by shape. Other big data projects in the social sciences include work on large scale survey and census data collected by government agencies, and the potential of social media streams. However, in the arts and humanities, there are many small or unfunded research projects with no formal requirement for data management and no tradition of data sharing. This is big data’s opposite – long tail data. The authors define long tail data as: “data and activity based around small-scale projects lasting only a few years and producing small volumes of material. Often these are based on the work of an individual or small research team.” The potential for making much greater use of this material – which many consider the norm in many areas of academic research – by applying the principles of research data management to it, offers an exciting opportunity for librarians to make a real contribution to future research and scholarship.
The Data Librarian’s Handbook
Robin Rice and John Southall
Facet Publishing, 2016.
K & IM Refer 33 (1) Spring 2017